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ABSTRACT 


This  thesis  explores  the  “Infomax”  method  of  Independent  Component  Analysis 
(ICA)  to  accomplish  blind  source  separation  (BSS).  The  Infomax  method  separates 
unknown  source  signals  from  a  number  of  signal  mixtures  by  maximizing  the  entropy  of 
a  transformed  set  of  signal  mixtures  and  is  accomplished  by  performing  gradient  ascent 
in  MATLAB.  This  work  specifically  focuses  on  small  numbers  of  two  types  of  signals: 
audio  signals  and  simple  communications  signals  (polar  non-retum  to  zero  signals).  The 
Infomax  method  is  found  to  be  successful  and  efficient  only  for  small  numbers  of  signals, 
and  improvements  to  the  gradient  ascent  algorithm  should  be  made  for  the  Infomax 
algorithm  to  succeed  for  more  than  three  signal  mixtures.  MATLAB  implementation 
code  is  included  as  appendices. 
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EXECUTIVE  SUMMARY 


As  the  use  of  wireless  communications  expands,  more  signals  are  introduced  into 
the  environment.  The  pervasiveness  of  these  signals  results  in  overcrowding  in  the 
spectrum  and  an  increasing  number  of  overlapping  signals.  Multiple  signals  overlapping 
in  time  and  frequency  create  co-channel  interference.  When  superimposed  signals  are 
received,  they  are  generally  difficult  to  demodulate  due  to  the  influence  of  the  interfering 
signal  on  the  decision  statistics  in  the  receiver,  resulting  in  inaccurate  demodulation. 

In  military  applications,  the  ability  to  correctly  demodulate  received  signals 
affects  friendly  communications  capabilities  as  well  as  hostile  threat  assessments.  Co¬ 
channel  signals  are  often  received  as  signal  mixtures,  although  the  nature  of  the  source 
signals  and  the  mixing  process  is  usually  unknown.  The  problem  of  finding  original 
signals  from  a  mixture  of  signals  is  called  blind  source  separation. 

Blind  source  separation  (BSS)  is  a  tenn  used  to  describe  the  method  of  extrication 
of  underlying  source  signals  from  a  set  of  observed  signal  mixtures  with  little  or  no 
information  as  to  the  nature  of  those  source  signals.  Blind  source  separation  has  a  variety 
of  applications,  including  neural  imaging,  economic  analysis,  and  signal  processing. 
Independent  Component  Analysis  (ICA)  is  a  method  of  finding  unknown  source  signals 
from  signal  mixtures,  and  it  is  just  one  of  many  solutions  to  the  BSS  problem. 

Just  as  ICA  is  one  of  many  methods  of  resolving  signal  mixtures  into  their 
original  component  source  signals,  many  approaches  have  been  developed  to  perform 
ICA.  This  research  focuses  on  the  “Infomax”  algorithm,  which  finds  a  number  of 
independent  source  signals  from  the  same  number  of  signal  mixtures  by  maximizing  the 
entropy  of  the  signals. 

Entropy  is  basically  the  average  information  obtained  when  the  value  of  a  random 
variable  is  found,  and  Infomax  is  based  on  the  fact  that  the  maximum  entropy  of  joint 
continuous  random  variables  occurs  only  when  the  random  variables  are  statistically 


xvii 


independent.  Therefore,  if  entropy  is  maximized,  the  resulting  signals  must  be 
independent.  If  the  contributing  signals  are  independent,  then  these  independent  signals 
must  be  the  original  source  signals. 

The  Infomax  algorithm  achieves  the  maximum  entropy  of  a  function  using 
gradient  ascent,  an  iterative  process  of  taking  a  “step”  in  the  direction  of  maximum 
gradient  until  a  local  maximum  is  reached.  If  this  process  is  repeated  sufficiently,  the 
global  maximum  will  eventually  be  found.  When  the  global  maximum  of  entropy  is 
found  using  gradient  ascent,  entropy  has  been  maximized,  and  the  resulting  signals  are 
the  source  signals. 

In  this  research,  the  Infomax  algorithm  was  implemented  in  MATLAB,  and  the 
Infomax  theory  was  first  tested  on  audio  signals.  For  small  numbers  of  signal  mixtures 
(two  to  three),  the  Infomax  algorithm  was  found  to  be  rather  efficient.  As  the  number  of 
signal  mixtures  increased,  however,  the  complexity  of  the  algorithm  increased  and 
efficiency  decreased.  The  algorithm  was  then  adapted  to  a  simple  communications 
signal,  the  polar  non-return  to  zero  (NRZ)  waveform.  Two  methods  of  modifying  the 
Infomax  algorithm  to  a  different  signal  type  were  tested,  and  the  superior  method  was 
chosen  to  run  simulations.  Again,  the  Infomax  algorithm  proved  to  be  efficient  in 
extracting  small  number  of  signals  (two  to  three)  from  the  same  number  of  signals 
mixtures.  As  the  number  of  signals  increased,  the  complexity  of  the  algorithm  increased, 
resulting  in  longer  and  less  accurate  computations. 

The  Infomax  method  of  ICA  was  found  to  be  quite  successful  in  solving  the  blind 
source  separation  problem  for  small  numbers  of  sources  (both  audio  signals  and  polar 
NRZ  signals).  This  research  could  be  extended  to  larger  numbers  of  signals,  as  well  as 
signals  of  many  different  types.  Improvements  to  the  gradient  ascent  algorithm  could 
improve  the  Infomax  algorithm’s  perfonnance  in  both  of  these  cases.  Additionally,  the 
many  other  methods  of  ICA  could  be  tested  and  compared  for  consistency,  efficiency, 
and  accuracy.  The  applicability  of  this  topic  to  a  variety  of  research  fields  creates 
abundant  opportunities  for  future  work:  a  very  exciting  future  for  breakthroughs  in  ICA 
as  a  method  of  Blind  Source  Separation. 


I. 


INTRODUCTION 


As  the  use  of  wireless  communications  expands,  more  signals  are  introduced  into 
the  environment.  The  pervasiveness  of  these  signals  results  in  overcrowding  in  the 
spectrum  and  an  increasing  number  of  overlapping  signals.  Multiple  signals  overlapping 
in  time  and  frequency  often  create  co-channel  interference  [8].  When  these 
superimposed  signals  are  received,  they  are  generally  difficult  to  demodulate  due  to  the 
influence  of  the  interfering  signal  on  the  decision  statistics  in  the  receiver  [10],  resulting 
in  inaccurate  demodulation. 

In  military  applications,  the  ability  to  correctly  demodulate  received  signals 
affects  friendly  communications  capabilities  as  well  as  hostile  threat  assessments.  Co¬ 
channel  signals  are  often  received  as  signal  mixtures,  although  the  nature  of  the  source 
signals  and  the  mixing  process  is  usually  unknown.  The  problem  of  finding  original 
signals  from  a  mixture  of  signals  is  called  blind  source  separation. 

A.  BLIND  SOURCE  SEPARATION 

Blind  source  separation  (BSS)  is  a  tenn  used  to  describe  the  method  of  extrication 
of  underlying  source  signals  from  a  set  of  observed  signal  mixtures  with  little  or  no 
information  as  to  the  nature  of  those  source  signals.  Blind  source  separation  has  a  variety 
of  applications,  including  neural  imaging,  economic  analysis,  and  signal  processing.  A 
classic  example  of  blind  source  separation  is  the  “cocktail  party  problem  [12],  [6].” 

1.  The  Cocktail  Party  Problem 

The  cocktail  party  problem  considers  the  example  of  a  room  full  of  people 
speaking  simultaneously.  Microphones  (equal  to  the  number  of  people  in  the  room)  are 
scattered  throughout  the  room,  and  each  microphone  records  a  mixture  of  all  the  voices  in 
the  room.  The  problem,  then,  is  to  separate  the  voices  of  the  individual  speakers  using 
only  the  recorded  mixtures  of  their  voices.  A  simplified  version  of  the  cocktail  party  is 
illustrated  in  Figure  1 . 
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Figure  1  The  “Cocktail  Party”  problem. 


While  this  figure  is  simplified  in  that  there  are  only  four  “attendees”  at  the  “party”,  it  is 
also  evident  how  complicated  the  problem  becomes  as  the  number  of  source  signals  and 
signal  mixtures  increases.  Independent  Component  Analysis  (ICA)  is  one  of  many 
methods  of  addressing  the  problem  of  blind  source  separation. 

B.  OBJECTIVES  AND  OUTLINE 

This  thesis  is  intended  to  provide  a  framework  for  future  research  into  the  topic  of 
Independent  Component  Analysis  at  the  Naval  Postgraduate  School.  Research  objectives 
are  as  follows. 
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1. 


Research  ICA  Methods 


A  variety  of  ICA  methods  were  reviewed,  and  a  single  method  was  chosen  for 
further  analysis.  Chapter  II  contains  a  history  of  ICA  and  a  literature  review.  The 
Infomax  method  was  chosen  based  on  the  more  comprehensive  introductory  material 
available  (mainly  in  the  fonn  of  Stone’s  book  [12]),  as  well  as  the  similarity  in  its 
resultant  equation  to  other  methods. 

2.  Analyze  Infomax  Method 

Chapter  III  presents  a  detailed  analysis  of  the  Infomax  method  of  ICA.  The  main 
concepts  of  Infomax  are  defined,  and  the  equations  necessary  for  the  implementation  of 
the  Infomax  algorithm  are  derived. 

3.  Implement  Infomax  Algorithm  in  MATLAB 

Basic  MATLAB  code  for  the  Infomax  method  is  provided  in  Appendix  D  of  [12], 
and  this  code  was  modified  and  adapted  for  convenience  and  expansion  of  applications. 
Chapter  IV  describes  the  process  of  tailoring  and  improving  code  for  the  basic  algorithm, 
as  well  as  describing  the  results  of  the  algorithm. 

4.  Draw  Infomax  Conclusions  and  Outline  Future  Research 

The  capabilities  of  the  Infomax  method  as  it  relates  to  signal  processing  are 
addressed  in  Chapter  V,  and  the  algorithm  was  found  to  be  very  successful  in  extracting 
small  numbers  of  signals.  Potential  future  research  topics,  including  adapting  the 
algorithm  for  larger  numbers  of  signals  and  testing  on  different  types  of  signals,  are 
addressed  as  well. 
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II. 


BACKGROUND 


where  the  subscripts  of  x  and  y  indicate  the  signal  number  and  the  superscripts  are  the 
time  index. 

The  principle  of  ICA  was  first  developed  in  the  early  1980’s  by  Herault,  Jutten, 
and  Ans,  neurophysiologists  studying  muscle  contraction.  They  observed  two  responses 
Xj (t)  and  x2 (/) ,  from  which  they  obtained  position  and  velocity  signals  y1  (t)  and  y2 (t) 

[6].  Utilizing  a  technique  of  non-linear  decorrelation,  they  showed  that  independent 
components  could  be  found  as  “nonlinearly  uncorrelated  linear  combinations”  [6],  While 
this  is  the  first  known  adaptation  of  ICA,  it  focused  only  on  two  signals  and  is  less 
efficient  than  the  more  modem  approaches  [6].  Other  early  work  on  ICA  included  the 
work  of  Cichocki  and  Unbehauen,  who  developed  the  algorithm  to  solve  Equation  (1) 
and  applied  it  to  neural  networks  [4]. 

In  the  early  1990’s,  Principal  Component  Analysis  (PCA)  and  Projection  Pursuit, 
both  similar  methods  to  ICA,  were  applied  to  the  blind  source  separation  problem 
[6]  [12].  Principal  Component  Analysis  seeks  sources  that  are  Gaussian  and  uncorrelated, 
rather  than  the  non-Gaussian,  independent  sources  of  ICA  [12].  As  uncorrelatedness  is 
not  as  strong  a  property  as  independence,  PCA  is  not  as  robust  as  ICA  but  is  less 
computationally  challenging.  Additionally,  PCA  can  be  utilized  as  a  precursor  to  the 
ICA  algorithm  [6].  Projection  Pursuit  seeks  one  independent  component  at  a  time  from  a 
set  of  mixtures  by  maximizing  kurtosis  to  find  the  most  non-Gaussian  signal  [12].  This 
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differs  from  ICA  in  that  projection  pursuit  extracts  one  signal  from  a  mixture  of  M 
signals  at  a  time,  while  ICA  extracts  all  M  signals  at  once  [12]. 

In  1995,  A.J.  Bell  and  T.J.  Sejnowski  developed  an  “information-maximization 
approach  to  blind  separation  and  blind  deconvolution,”  which  is  now  referred  to  as 
Infomax  [2].  Infomax  is  a  method  of  finding  mutually  independent  signals  by 
maximizing  information-flow,  or  entropy.  Infomax  yields  the  same  result  as  another  ICA 
method,  Maximum  Likelihood  Estimation  (MLE)  [6],  [12].  MLE  optimizes  parameter 
values  to  find  the  best  fit  of  observed  data  given  some  model  (MLE  optimizes  W  to  find 
the  best  fit  of  the  extracted  signals  y  to  the  source  signals  s)  [6],  [12].  Both  Infomax 
and  MLE  make  several  assumptions  about  the  source  signals,  the  validity  of  which  is 
explored  in  Chapter  III. 

In  Infomax  and  MLE,  as  well  as  PC  A  and  projection  pursuit,  optimization  is 
achieved  by  gradient  ascent.  Gradient  ascent  is  an  optimization  method  that  maximizes  a 
function  of  multiple  parameters  by  iteratively  improving  an  initial  guess  using  the 
gradient,  which  points  in  the  direction  of  maximum  slope  [12].  A  disadvantage  of  this 
method  is  that  if  the  step  size,  or  learning  rate,  is  not  chosen  carefully,  the  function  does 
not  converge  properly.  Additionally,  the  Gradient  Ascent  method  only  finds  a  local 
maximum,  so  if  the  initial  starting  value  is  closer  to  a  local  maximum  than  the  global 
maximum,  the  algorithm  does  not  converge  to  find  the  maximum  function  value.  Either 
of  these  situations  can  produce  erroneous  results  [6]. 

Slightly  more  advanced  methods  of  ICA  include  complexity  pursuit  and  FastICA. 
Complexity  Pursuit  describes  a  method  of  ICA  that  extracts  signals  with  the  least 
complexity,  as  a  mixture  of  signals  will  be  more  complex  than  any  of  its  source  signals 
[12].  FastICA  describes  a  fixed  point  algorithm  that  can  be  applied  (in  lieu  of  gradient 
ascent)  to  perform  more  efficient  calculations  [6]. 

Relatively  recent  extensions  of  the  ICA  model  include  applications  for  noisy 
environments,  cases  in  which  there  are  fewer  mixtures  than  independent  components,  and 
circumstances  where  convolution  is  incorporated  in  the  creation  of  the  mixtures. 
Considerations  have  also  been  given  to  nonlinear  mixing  processes  and  situations  where 
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the  components  are  Gaussian  and  have  time  dependencies.  Of  particular  interest  is  the 
application  of  ICA  to  telecommunications.  Uses  of  ICA  and  BSS  in  code-division 
multiple  access  (CDMA)  have  been  explored.  This  thesis  focuses  mainly  on  the  Infomax 
and  Gradient  Ascent  methods,  and  the  other  methods  and  extensions  of  ICA  were  not 
pursued  [6]. 

This  chapter  introduced  the  background  of  ICA  as  a  method  of  BSS.  In  Chapter 
III,  the  Infomax  algorithm  and  its  implementation  are  analyzed. 
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III.  ANALYSIS 


In  this  chapter,  one  independent  component  analysis  algorithm  and  its 
implementation  are  presented.  The  mathematics  in  this  chapter  is  mostly  adapted  from 
James  V.  Stone’s  book,  “Independent  Component  Analysis:  A  Tutorial  Introduction” 
[12].  Note  that  in  this  chapter,  as  in  [12],  the  notation  y  represents  a  vector  function  of 

time  while  y'  represents  a  sample  of  that  vector  function  at  specific  time  t . 

A.  INFOMAX  STRATEGY 

Infomax  is  a  method  of  ICA  grounded  in  infonnation  theory  which  aims  to  find 
independent  source  signals  by  maximizing  entropy.  The  details,  calculations,  and 
assumptions  involved  with  the  Infomax  method  are  discussed  later  in  this  chapter,  but  the 
general  strategy  of  Infomax  begins  with  Equation  (1),  where  the  extracted  signals  y  are 
obtained  from  signal  mixtures  x  by  optimizing  an  unmixing  matrix  W .  Infomax  holds 
that  the  extracted  signals  are  source  signals  if  they  are  mutually  independent.  While 
independence  of  the  signals  cannot  be  measured,  entropy  can.  Entropy  is  related  to 
independence  in  that  maximum  entropy  implies  independent  signals.  Therefore,  the 
objective  of  ICA  is  to  find  the  unmixing  matrix  W  that  maximizes  the  entropy  in  the 
extracted  signals  y . 

Entropy  of  the  signal  mixtures  x  is  constant,  but  the  change  in  entropy  can  be 
maximized  by  mapping  the  signals  y  =  Wx  to  an  alternate  set  of  signals 
Y  =  g(y)  =  g(Wx)  .  This  mapping  spreads  out  Y  so  that  the  change  in  entropy  from 
x  — ^  Y  can  be  maximized  by  optimizing  the  unmixing  matrix  W ,  and  when  entropy  is 
maximized,  the  resulting  signals  are  independent.  The  inverse  y  =  g  1  ( V)  is  then  taken, 
resulting  in  extracted  signals  y  that  are  also  independent.  Since  the  extracted  set  of 
signals  y  are  independent,  they  must  be  the  original  source  signals  s .  The  Infomax 
strategy  is  depicted  graphically  in  Figure  2. 
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Figure  2  Infomax  strategy. 


The  aforementioned  strategy  is  based  on  four  properties  of  bounded  signals  which 
are  discussed  in  detail  in  Section  C.l  and  involve  the  topics  mutual  independence, 
invertible  functions,  inverse  functions,  and  entropy.  Mutual  independence  means  that  the 
outcome  of  one  event  has  no  effect  on  the  outcome  of  another.  A  function  y  =  f  (x)  is 
invertible  if  every  value  of  x  corresponds  to  only  one  value  of  y .  Entropy  is  discussed 
in  the  following  section. 

B.  ENTROPY 

In  1948,  Claude  Shannon  introduced  the  concept  of  information  entropy  as  a 
measure  of  uncertainty  associated  with  a  random  variable  [9].  Stone  describes  entropy  as 
a  “measure  of  uniformity  of  distribution  such  that  complete  uniformity  equals  maximum 
entropy.”  Other  descriptions  include  average  surprise,  or  average  information  [13]. 

1.  Information 

The  information  associated  with  the  occurrence  of  event  A  is  defined  as 

I  {A)  =  Inf  — r— 1  =  -ln(Pr[A]).  (3) 

v  “rL^Jy 

Note  that  the  base  of  the  logarithm  is  arbitrary,  but  the  natural  logarithm  is  used  in  this 
thesis  for  mathematical  convenience.  Therefore,  the  units  of  infonnation  are  nats.  If  the 
probability  of  event  A  occurring  is  high  (Pr[A]«l),  then  it  contains  very  little 
information: 

I  {A)  =  -ln(Pr[A])  «  -ln(l)  «  0 .  (4) 

Conversely,  if  the  probability  of  an  event  is  very  low  (Pr[A]  «  0) ,  then  it  contains  infinite 
information: 

10 


1(A)  =  -ln(Pr[A])  «  -ln(0)  «  oo  (5) 

Entropy  is  average  information,  which  can  be  obtained  from  the  expectation.  An 
expectation  is  essentially  a  weighted  average  and  is  defined  in  [13]  as 

£{A}  =  ]Tx(s)Pr[Y].  (6) 

The  entropy  H,  or  expected  information,  is  then 

H{A)  =  E{I(A)}  =  ^I[  4]Pr[4]  (7) 

i 

where  i  represents  an  arbitrary  number  of  events.  For  this  arbitrary  number  of  events, 
entropy  H(A )  is  obtained  by  substituting  Equation  (3)  into  Equation  (7),  resulting  in 

H{A)  =  E{-ln(Pr[A])}  =  X-ln(Pr[4)Pr[4]  (8) 

i 

which  can  be  rearranged  to  the  form 

H(A)  =  -XPr[4]ln(Pr[4).  (9) 

i 

Equation  (9)  is  the  fonnal  definition  of  entropy  for  a  set  of  events  [13]. 

2.  A  Two-Event  Example 

In  circumstances  where  there  are  only  two  possible  outcomes  (n= 2)  for  an  event 
(Yes/No,  Heads/Tails,  0/1,  etc.),  the  probabilities  of  the  two  events  sum  to  one,  and 

Pr[4]  +  Pr[4]  =  l.  (10) 

Define 

Vx\A^  =  p  and  Pr[A2]  =  l-/i.  (11) 

For  the  two  event  example,  Equation  (9)  can  be  expressed  as 

i/(^)  =  -(Pr[4]ln(Pr[4])  +  Pr[4]ln(Pr[4]))  (12) 

and  substitution  of  Equation  (11)  into  Equation  (12)  yields 

H{p)  =  -pHp)-(\-p)\n{\-p).  (13) 

Entropy  H(p) ,  where  p  takes  on  the  values  of  probability  ranging  from  zero  to 
one  ( 0  <  p  <  1 ),  is  shown  in  Figure  3.  It  is  evident  from  the  plot  that  the  maximum  value 
of  entropy  is  obtained  when  p  =  0.5 ,  which,  for  example,  could  be  a  fair  coin  where  a 
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head  is  as  likely  to  result  as  a  tail.  The  probability  mass  function  of  a  fair  coin  resembles 
a  uniform  probability  density  function  (pdf),  illustrating  that  signals  with  a  uniform  pdf 
have  maximum  entropy. 


Entropy  can  also  be  expressed  as  a  continuous  random  variable  in  analogy  with 
Equation  (9)  as  the  limit  n  — >  co .  The  entropy  of  a  continuous  random  variable  A  is 
defined  as 

H (^)  =  “i  x  Pa^^Pa^)^  =e{-^{Pa(A))}  •  (14) 

All  expectations  can  be  approximated  by  averaging  a  reasonably  large  number  of 
trials.  Applying  this  to  Equation  (14),  we  get 

H(A)  =  -^-fj\apMt),  (15) 

™  t 

where  t  is  a  time  sample  and  N  is  the  number  of  time  samples.  This  is  the  definition  of 
entropy  that  is  utilized  in  Infomax. 
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3.  Entropy  of  a  Univariate  Probability  Density  Function 

Since  the  Infomax  method  obtains  mutually  independent  signals  by  maximizing 
entropy,  and  the  entropy  of  the  signal  mixtures  x  is  constant,  an  expression  for  entropy 
of  a  transfonned  signal  Y  is  necessary  so  that  the  change  in  entropy  can  be  maximized. 
A  simplified  expression  of  entropy  can  be  obtained  by  considering  the  univariate  case, 
where  the  signal  contains  only  one  dependent  variable.  In  this  case,  x  is  a  random  vector 
and  each  element  of  x  is  a  different  signal  sampled  at  the  same  time  t . 

From  Equation  (15),  entropy  of  a  signal  Y  is 

<>6> 

where  Y  =  g(y ) ,  and  y  is  scalar  function  of  time  [y  =  y(t)  =  y*  j ,  and  the  superscript  t 
indicates  the  scalar  value  of  y  at  time  t.  The  function  g(y)  is  the  cumulative 
distribution  function  (cdf)  of  the  desired  signal  y  and  is  often  referred  to  as  the  “model 
cdf  ’  of  the  source  signals  as  it  is  chosen  to  extract  a  desired  type  of  source  signal.  This  is 
described  in  the  following  section. 


The  univariate  case  is  explored  by  modifying  Equation  (1)  to  y  =  wrx,  where 

wr  is  a  single  row  of  the  unmixing  matrix  W ,  and  x  is  a  vector  representing  a  snapshot 
of  M  signals  in  time,  as  shown  in  vector  matrix  notation.  That  is,  for 

V  =  wrx  (17) 


where 


then 


x[ 

wn  • 

•'  wim1 

x'2 

and 

*£ 

>21 

W22  ' 

>vr 

(18) 

x1 

J 

_WM1 

WM2  ■ 

W  MM  _ 

y  =  wrx  =  [ 


w. 


21 


W. 


22 


W. 


2  M 


X, 


(19) 
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Vector  multiplication  yields 


/  =  W21V  +  w22x2  +  •  •  •  +  w2Mx^ ,  (20) 

where  y‘  is  a  scalar  value  of  one  signal  sampled  at  time  t .  The  transformation  of  y‘ 
through  the  model  cdf  g(y')  yields  the  mapped  value  of  Y ,  where  7  is  a  random 
variable  on  the  range  from  zero  to  one;  i.e., 

Y  =  g(yt)  =  g(w2X+w22x'2+---  +  w2Mx'M)  (21) 

From  Equation  (16),  pY{Y')  is  the  pdf  of  the  mapped  signal  Y  =  g(y)  and  is 
related  to  the  pdf  of  the  extracted  signal  y,  py(y')  ■>  as  shown  in  Figure  4. 


Figure  4  Transformation  of  y  to  Y,  From  Ref.  [12]. 


Figure  4  illustrates  how  a  signal  y  =  (y y 5000  )  (C)  can  be  used  to  approximate 

its  pdf  (D).  The  transformation  of  its  pdf  through  its  cdf  (B)  yields  a  uniform  distribution 
(A).  From  Figure  4, 

Py(Y‘)AY  =  py (y ' ) Ay  .  (22) 
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By  rearranging  Equation  (22),  we  get 


Since 


Equation  (23)  becomes 


pY(Yt)  =  py(yt 


py(y‘) 

A  Y 


AT  dY 

- >  — 

Ay  dy 


as 


Ay  ->  0 , 


(23) 


(24) 


p  (y‘) 

Py(Y  )  ~~dY~ '  (25) 

dy 

The  magnitude  of  the  denominator  of  Equation  (25)  is  taken  into  account  for 
monotonically  increasing  and  decreasing  functions,  resulting  in 


pA?) 


py(y‘ ) 

dY_ 

dy 


(26) 


Since  Y  =  g(v)  where  g(v)  is  the  model  cdf  of  the  source  signal,  then  —  =  g'(y)  ,  and 

dy 


g  '(y)  is  the  pdf  of  the  source  signal  ps  (y)  .  Substituting  this  result  into  Equation  (26), 


we  obtain 


pA?) 


Py(y') 

ps(y‘ ) 


(27) 


Substituting  Equation  (27)  into  Equation  (16),  we  get  a  univariate  expression  for  entropy 
in  terms  of  the  pdfs  of  the  source  and  extracted  signals: 


H{Y)  = 


If >sE2 

n,  p,(y‘) 


(28) 


To  solve  Equation  (28),  an  expression  for  the  pdf  of  the  extracted  signal  p  (y)  is 


necessary,  but  as  the  discussion  of  the  univariate  case  is  meant  as  an  introduction  to  the 
multivariate  case,  this  is  addressed  in  the  next  section. 
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4.  Entropy  of  a  Multivariate  pdf 

The  univariate  model  can  be  extended  to  a  general  case  in  which  there  is  more 
than  one  random  variable.  From  Equation  (14),  Entropy  H  is  equal  to 

H(A)  =  E{-\n(pA{aj)}.  (29) 

This  can  be  represented  in  vector  notation  for  multiple  variables,  where 

A  =  { Al,A2,---,AM}  and  a  =  {a1,a2,—,aJf}.  The  resulting  multivariate  expression  for 
entropy  is 

=  E  {-ln(/>A(a))}  (30) 

where  pA( a)  is  the  multivariate  pdf  of  random  vector  A  .  If  each  a,  is  independent  and 
identically  distributed,  then 

M 

Pa (a)  =  Pa  Oi )Pa («2 ) '  • '  Pa (aM )  =  Yl PA(ai )  •  (3 1 ) 

(=i 

The  natural  log  of  the  multivariate  pdf  is 

f  M  h 

ln(^A(a))  =  ln  Y^Pa^i)  =  ^{Pa^Pa^i)-" Pa^m))  ■  (32) 

V  i= 1  J 

From  the  properties  of  logarithms,  the  log  of  a  product  is  equal  to  the  sum  of  the  logs,  so 
ln(^4(ai)^(o2)---^(aM))  =  ln(/?,(ai))  +  ln(^4(o2))  +  ---  +  ln(^(oM)).  (33) 

Equation  (33)  can  be  rewritten  as 

M 

Eln(^^))-  (34) 

1=1 

Substituting  Equation  (34)  into  Equation  (30),  we  get  the  resulting  expression  for  entropy 
as 

H  (A)  =  E  In  (Pa  (ai  ))|  •  (35) 

The  expectation  can  be  estimated  by  taking  an  average,  which  yields  an  expression 
similar  to  the  univariate  result  in  Equation  (15): 

1  M  N  i  N 

^(A)=-— EEln(^(«, •))=-— Eln(^(a'))  (36) 

^  i= i  /= i  w  t= i 
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If  Equation  (36)  is  applied  to  the  mapped  signal  Y  transfonned  from  the  model 
cdf  Y  =  g(y)  =  g(Wx)  ,  the  multivariate  expression  of  entropy  of  signals  Y  becomes 

^(Y)  =  -4Zln(MY')),  (37) 

Jy  t= i 

which  is  the  multivariate  form  of  the  univariate  expression  in  Equation  (16). 


As  in  the  univariate  case,  an  expression  is  needed  for  the  joint  pdf  pY( Y')  and  is 
obtained  by  adapting  Equation  (26)  for  the  multivariate  case,  resulting  in 


M Y)  = 


Pyi  y) 

5Y 

0y 


(38) 


The  denominator  of  Equation  (38)  is  the  Jacobian,  which  is  examined  in  more 
detail  in  the  following  section.  Following  the  logic  in  the  univariate  case,  since 
Y  =  g(y),  where  g(y)  is  the  model  cdf  of  the  source  signals,  then  SY/Sy  is  the  pdf 


g'(y)  of  the  source  signals,  which  can  also  be  expressed  as  ps(  y)  .  Equation  (38)  can  be 


rewritten  as 


Py(  y)  = 


Pyt  y) 

Pst y) 


(39) 


Substitution  of  Equation  (39)  into  Equation  (37)  results  in  a  multivariate  expression  for 
entropy  in  terms  of  both  the  source  signal  pdf  ps( y)  and  the  extracted  signal  pdf  p  (y)  : 


1  N 

H(  Y)  = - Yin 

NU 


PytY) 

Pstf) 


(40) 


As  in  the  univariate  expression  of  entropy  in  Equation  (28),  this  multivariate 
expression  of  entropy  requires  an  expression  for  the  pdf  of  the  extracted  signal  pv( y)  . 


To  obtain  this  expression,  the  relationship  in  Equation  (38),  which  is  true  for  any 
invertible  function,  is  taken  into  account.  The  pdf  p  ( y)  of  the  extracted  signal  y  =  Wx 


can  be  expressed  as 


Pyt  y)  = 


P.M) 

8y_ 

dx 


(41) 
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As  in  Equation  (38),  the  denominator  of  Equation  (41)  is  the  Jacobian. 


a.  The  Jacobian 

The  Jacobian  J  is  a  scalar  value  which  is  the  detenninant  of  an  M  x  M 
Jacobian  matrix  J  of  partial  derivatives  [1].  If  M  =  2 ,  then 


dvi 

dx1 

dx2 

dy2 

fix. 

dx2 

5x 


Since  the  Jacobian  J  is  the  detenninant  of  the  Jacobian  matrix  J  [1], 

dyx  dy2  dyx  dy2 


J=  J  = 


dxx 

8x2 

fy2 

8y2 

dxx 

dx2 

dxx  dx2  dx2  dxx 


(42) 


(43) 


b.  The  Jacobian  and  the  Unmixing  Matrix 

An  example  with  M  =  2  is  used  to  illustrate  the  relationship  between  the 
Jacobian  matrix  J  and  the  unmixing  matrix  W . 

From  Equation  (1),  y  =  Wx  ,  where 


y  = 

A 

w 

wn 1 

X  = 

X1 

-Jhj 

_W2 1 

W22l 

_x2_ 

Substituting  these  values  into  Equation  (1),  we  get 


Ti 

wn 

™12 

*1 

_w21 

W22. 

_x2_ 

Evaluation  of  this  equation  shows  that 


and 


y1=wnxl+w12x2 


y2=w21x1+w22x2. 


(44) 


(45) 


(46) 

(47) 
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From  Equation  (42),  the  Jacobian  matrix  requires  expressions  for  dyl/dx 1?  dyjdx2 , 
dy2/dx j  ,  and  Gy:/Gx2  .  The  first  two  partial  derivatives  are  obtained  from  Equation  (46), 
while  the  second  two  are  obtained  from  Equation  (47): 


dx , 


=  w, 


(48) 


dx. 


=  w, 


12 


(49) 


^2 

3x, 


■  =  w 
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(50) 


<^2 

dx, 


=  w. 


22 


(51) 


Substitution  of  Equations  (48)  -  (51)  into  the  Jacobian  matrix  in  Equation  (42)  yields  the 
Jacobian  matrix 


(52) 


Equation  (52)  is  identical  to  the  expression  for  W  used  in  Equation  (44).  Therefore, 


J  = 


Wn  wI2 


=  W. 


The  determinant  of  J  is  then  equal  to  the  determinant  of  W ,  so  from  Equation  (43) 

J  =  |J|  =  |W| 


(53) 


(54) 


The  multivariate  expression  of  entropy  in  Equation  (40)  requires  an  expression  for 
p  ( y) ,  where  p  (y)  =  /?v(x)/|5y/5x|  as  shown  in  Equation  (41)  .  As  |<3y/<3x|  is  the 
Jacobian,  from  Equation  (54) 

Idyl 


5x 


=  J=  w 


and  the  pdf  of  the  extracted  signal  can  be  rewritten  as 

PM) 


pv(  y)  =  - 


w 


(55) 


(56) 
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This  result  leads  to  the  expression  of  entropy  used  in  the  Infomax  algorithm. 


5.  INFOMAX  Expression  for  Entropy 

Substituting  the  expression  for  the  pdf  of  the  extracted  signals  found  in  Equation 


(56)  into  the  expression  for  entropy  in  Equation  (40),  we  get 

r  \ 


1  N 

H(  Y)  = - Yin 

Ntt 


pM  ) 

W|/>,(y'), 


(57) 


From  the  properties  of  logarithms,  this  equation  can  be  rewritten  as 


//<Y)=-iE(lnA(x')-|nlwl-lnR(y'))- 


JVt, 

Distribution  of  the  summation  results  in  the  form 

N  i  N 


"(Y>=YI>A(s'>Yi>A(>'')+i'iiwi- 

N  t= 1  N  t= 1 


(58) 


(59) 


When  the  first  part  of  Equation  (59)  is  compared  to  the  expression  of  entropy  in  Equation 
(37),  it  is  recognized  as  the  entropy  of  X  : 


1  N 

TV  f=1 


Equation  (59)  can  now  be  rewritten  as 


1  w 

H(  Y)  =  H(X)  +  -  X ln  Ps  (y' )  +  In  [ W| . 

A  t= i 


(60) 


(61) 


Since  the  unmixing  matrix  W  that  maximizes  the  entropy  H(Y )  does  not  affect  the 
entropy  // (X) ,  // (X)  can  be  ignored,  meaning  that  the  unmixing  matrix  that  maximizes 
Equation  (61)  also  maximizes 


Y>  =  -Jr  x;  In  (y '  >  +  In  I  W| 

N  t=  1 


(62) 


where  t  is  the  time  index.  Equation  (62)  can  be  modified  further  by  ignoring  the  ordering 
of  the  signals  M  ,  resulting  in 


1  M  N 

*<Y)  =  ^II>A«>  +  lnlWi 

2V  7=i  /=i 


(63) 
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The  W  that  maximizes  Equation  (63)  maximizes  the  entropy  in  Y  ,  implying  the 
rows  of  Y  are  independent.  Since  y  is  the  inverse  of  Y  ,  this  implies  the  rows  of  y  are 
independent,  which  implies  that  W  is  the  unmixing  matrix  that  yields  the  original 
signals.  Equation  (63)  is  the  fundamental  equation  used  in  the  Infomax  algorithm  and  is 
based  on  several  assumptions. 

C.  PROPERTIES  AND  ASSUMPTIONS 

The  process  shown  in  Section  B  is  based  on  the  following  properties  of  bounded 
signals  and  assumptions. 

1.  Properties 

a.  Bounded  Signals  with  a  Uniform  pdf  Have  Maximum  Joint 
Entropy 

This  is  addressed  in  detail  in  Section  B.2.  Equation  (28)  at  the  end  of 
section  B.3  demonstrates  this  property  as  well.  As  the  goal  of  Infomax  is  to  optimize  W 
so  that  the  extracted  signals  match  the  source  signals,  an  important  characteristic  to  note 
is  that  W  is  chosen  so  that  ps(y‘)  =  p  (y ') ;  i.e.,  the  ratio  of  the  pdfs  is  equal  to  one. 
From  Equation  (27), 

p  (i/) 

=  L 

ps(y ) 

Therefore,  if  the  pdf  of  the  mapped  signal  Y  is  equal  to  one,  then  the  mapped  signal  is 
uniformly  distributed  on  [0,1],  which  indicates  a  maximum  entropy  function  for  random 
variables  bounded  by  [0,1]. 

b.  Signals  with  Maximum  Joint  Entropy  are  Mutually  Independent 

This  is  proven  in  Appendix  B.  Since  entropy  is  additive  for  independent 
random  events,  and  entropy  of  dependent  random  events  is  less  than  that  of  independent 
random  events,  maximizing  entropy  must  yield  mutually  independent  signals  [3]. 
Sections  B.3  through  B.5  addressed  obtaining  an  expression  of  entropy,  and  Section  D 
describes  the  method  to  maximize  entropy. 
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c.  Any  Invertible  Function  of  Mutually  Independent  Signals  Yields 
Mutually  Independent  Signals 

This  allows  W  to  be  optimized  so  that  g(Wx)  yields  independent 
signals.  If  g(Wx)  is  a  function  of  mutually  independent  signals,  then  Y  =  g(Wx)  also 
yields  mutually  independent  signals. 

d.  If  a  Function  is  Invertible,  its  Inverse  is  Invertible 

This  property  allows  the  extracted  signals  y  to  be  obtained  from  the 
mutually  independent  signals  Y  =  g(y)  =  g(Wx)  by  taking  the  inverse  y  =  g  '(Y) . 
Since  W  is  optimized  so  that  each  row  of  Y  is  independent,  its  inverse  y  is  also 
independent. 

2.  Assumptions 

The  Infomax  method  makes  the  following  assumptions. 

a.  All  Time  Samples  of  Each  Signal  Are  Independent 

This  assumption  is  not  realistic,  but  it  allows  M  independent  signals  to  be 
estimated  over  N  time  steps.  Since  communication  signals  violate  this  assumption,  it  is 
recognized  that  the  Infomax  method  may  not  have  universal  applicability  to 
communications  signals,  especially  those  signals  sampled  at  rates  much  higher  than  the 
Nyquist  rate. 


b.  All  Source  Signals  Can  Be  Approximated  by  the  Same  pdf 

This  assumption  is  also  unrealistic,  but  it  is  convenient  and  leads  to  useful 
results.  It  has  been  shown  by  [12]  that  the  Infomax  algorithm  is  somewhat  forgiving  of 
violations  of  this  assumption. 

c.  The  Model  pdf  is  an  Exact  Match  for  the  pdf  of  the  Source 
Signals 

The  third  assumption  is  highly  unlikely,  but  others  [2]  have  had  success 
with  Infomax  despite  the  violation  of  this  assumption.  This  is  because  the  exact  source 
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signals  are  unknown,  so  if  the  model  pdf  is  an  approximation  of  the  source  pdf,  then  the 
extracted  signals  are  the  best  possible  approximation  of  the  sources  signals. 

While  none  of  the  assumptions  above  are  truly  valid,  all  are  acceptable  as 
some  assumptions  must  to  be  made  about  the  source  signals  in  order  to  establish  a 
starting  point  for  blind  source  separation. 

D.  GRADIENT  ASCENT 

Equation  (63)  gives  the  entropy  of  the  transformed  signals  Y  to  within  a 
constant.  Now  that  an  expression  for  entropy  has  been  derived,  the  objective  of  Infomax 
is  to  find  an  unmixing  matrix  W  that  maximizes  the  entropy  of  Y ,  or  equivalently 
maximizes  h( Y)  where  Y  =  g(y)  =  g(Wx)  .  Gradient  ascent  is  the  method  used  to 
optimize  the  unmixing  matrix  W .  Essentially,  gradient  ascent  is  an  iterative  process  of 
taking  a  “step”  in  the  direction  of  maximum  gradient  until  a  local  maximum  is  reached. 
Gradient  ascent  requires  an  expression  for  the  gradient  of  entropy. 

1,  Gradient  of  Entropy 

Equation  (63)  is  rewritten  for  this  purpose  by  taking  the  expectation  over  time 
rather  than  assuming  that  all  time  steps  are  independent,  which  results  in 

/*(Y)  =  E |^hi/7i(y.)|  +  ln|w| .  (65) 

The  gradient  is  found  by  taking  the  partial  derivative  of  h  with  respect  to  W , 
dh/d W ,  and  for  the  purpose  of  simplification,  the  gradient  is  first  found  with  respect  to 
one  element  of  W ,  dh/dWj  ,  and  is  then  expanded  to  every  element.  Hence, 

_gy  JfgtogMLghN  (66) 

dWy  |tr  dWlj  j  dWj 

Simplification  of  this  partial  derivative  takes  place  in  two  parts,  treating  first  the  first 
tenn  and  then  the  second  term. 
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a.  The  First  Term  of  Equation  (66) 

To  simplify  the  expectation  in  Equation  (66),  the  partial  derivative  is 

examined: 


Let 


Equation  (67)  can  now  be  expressed 


Using  the  Chain  Rule  [1 1],  we  get 


o  hi  g'(v,) 

dWv 

(67) 

u  =  g\y,). 

(68) 

d\mi  _ 

dWij  ‘ 

(69) 

d  In  u  1  , 

- =  — u  . 

dW-j  u 


(70) 


Equation  (68)  gives  an  expression  for  u  ,  and  the  derivative  of  u  is 

Su  og  '( v, ) 


u  = 


nr  nr 


(71) 


(72) 


Replacing  the  expression  in  Equation  (70)  with  Equations  (68)  and  (71),  we  get 

dlnu  _  1  dg'(y,  ) 

dWj  g’(v,)  dWtJ 

The  expression  dg\yi)/dWjj  in  Equation  (72)  can  be  simplified  further  using  the  Chain 
Rule  in  Leibniz  notation  [11], 

og  og  dy 


From  Equation  (73), 


dW  dy  dW' 

og'(Vi)  _  %’(v,)  oyt 


dW„  dv ,  dW, 


Equation  (74)  further  simplifies  to 


dir.  dir. 


(73) 


(74) 


(75) 
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The  expression  dyijdWiJ  is  one  element  from  a  mixture  x  .  Generalizing  Equations  (46) 


and  (47)  with  this  expression,  we  get 


dW..  J  ‘ 


Substituting  Equation  (76)  into  Equation  (75),  we  obtain 


Now  substituting  Equation  (77)  into  Equation  (72),  we  get 


S  In  u  1 

^vT~gW 


g\y,)xr 


Equation  (78)  can  now  replace  the  expression  5 In g'iy^/dWy  in  the  expectation  in 


Equation  (66),  resulting  in 


Jf  ,  ainlwl 


dw„  tTg’Cv,)  7  5 In W, 


Now  g\yt)l g\yt )  is  further  simplified  for  convenience.  We  define 


iF(v)  =  ^-^l 

g\yt) 


and  Equation  (79)  can  be  expressed  as 


zrfv’xw/  X  ]  ,  Sln|w| 

I  ^  ’’  J  (  31.,  Uf 


dWv  [tr  din  Wy 

Equation  (81)  represents  the  partial  derivative  of  entropy  in  Equation  (66)  with  the  first 
tenn  fully  simplified. 

b.  The  Second  Term 

To  simplify  the  second  tenn  o  in  |w|/Sln  Iff ,  an  example  is  used  to  show 


Sln|w| 

Sin  Wj 


=rw-r_ 

L  Ah 


where  W  T  is  one  element  of  the  inverse  of  the  transposed  unmixing  matrix: 

Aii 
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w-r=[wr] 

When  M  =  2  ,  the  unmixing  matrix  W  is 

wn  w12 
w21  w22_ 

The  transpose  Wr  is 


(83) 

(84) 


Wr  = 


wn  w2l 


and  the  detenninant  of  the  unmixing  matrix  |w|  is  equal  to  the  detenninant  of  the 
transposed  unmixing  matrix  |\Vr|: 

|w|  =  wuw22  -w2lwn  =  |wr| .  (86) 

When  i  =  j  =  1 , 


3 In  W  _  dln(wnw22  -  w2lwn ) 
3  In  Wn  wn 


_  w22 


wnw22~w2iwn  W 


The  inverse  transpose  W  T  ,  using  Equation  (85)  in  Equation  (83),  is 


t  _rwrT 

-i  _  l 

w22 

~wn 

W22 

_  W22 

~LW  J 

L"W21 

wn  _ 

WUW22~W21Wn 

W 

The  methods  in  Equation  (87)  and  Equation  (88)  yield  the  same  result,  as  is  the  case  for 
all  values  of  i  and  j  .  This  example  illustrates  Equation  (82),  which  is  accepted  without 
proof  as  is  done  in  [12]. 


c.  The  Gradient  of  Entropy 

Equation  (82)  can  be  substituted  into  the  expression  for  the  gradient  of 
entropy  in  Equation  (81),  resulting  in 

+  (89) 

When  this  expression  is  expanded  to  all  elements  in  the  unmixing  matrix  W ,  it  yields  a 
complete  expression  for  the  gradient  of  entropy  Vh  ,  where  the  gradient  of  a  scalar  with 
respect  to  a  matrix  is  defined  as 
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dh 

dh 

dh 

8Wn 

dWl2 

dW 

urr\  M 

dh 

dh 

dh 

Vh  = 

dW2, 

dW22 

SW2U 

dh 

dh 

dh 

dW 

\_^vr  m\ 

8W 

uvv  M2 

dW 

MM 

The  gradient  of  entropy  Vh  for  all  elements  of  the  unmixing  matrix  W  is  then 

VA  =  W^+£{vF(y)xr}.  (91) 

By  assuming  that  signals  are  ergodic  [7],  the  expectation  can  again  be  mitigated,  resulting 
in 


Vh  =  W~T 


1 

H - 

N 


N  T 

2>(y')[*']  ■ 


(92) 


2.  Gradient  Ascent  Algorithm  for  Infomax 


The  optimal  unmixing  matrix  W  is  found  by  maximizing  entropy;  that  is, 
iteratively  following  the  gradient  Vh  until  a  local  maximum  is  reached.  This  is 
accomplished  with  the  following  algorithm 

Wnew=W 'M+rjVh  (93) 


where  ^  is  a  small  constant.  Inserting  the  expression  for  Vh  in  Equation  (92)  into 
Equation  (93),  we  get  the  expression  to  optimize  the  unmixing  matrix  W  to  maximize 
entropy: 


Wnew=Wold+?7 


(94) 


1  N  7 

V  t= i  y 

Equation  (94)  is  the  general  form  of  the  Infomax  algorithm  using  gradient  ascent  to 
optimize  the  unmixing  matrix  W .  It  is  important  to  note,  however,  that  the  gradient 
ascent  algorithm  only  finds  a  local  maximum,  which  is  not  necessarily  the  global 
maximum  of  the  function.  This  can  be  mitigated  by  running  multiple  trials  of  the 
gradient  ascent  algorithm,  initiated  from  different  starting  points. 
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The  expression  'F(y')  is  shown  in  Equation  (80)  to  equal  g"(y)/ g’(y)  •  It  should 
be  recalled  that  g(y)  is  the  model  cdf  of  the  source  signals,  so  'F(y')  depends  upon  the 
source  signals  that  the  algorithm  aims  to  extract.  Since  W  is  optimized  by  maximizing 
entropy  of  the  transformed  signals  Y  =  g(y)  and  maximum  entropy  signals  Y  are 

mutually  independent,  the  signals  y  =  g  1  (Y)  are  also  mutually  independent.  Since 
Infomax  extracts  a  set  of  signals  y  which  are  mutually  independent  and  the  only 
mutually  independent  signals  possible  are  the  original  source  signals,  the  results  of  the 
optimized  algorithm  in  Equation  (94)  are  the  source  signals  s . 

In  this  chapter,  the  Infomax  algorithm  was  derived  and  its  implementation  using 
gradient  ascent  was  analyzed.  In  Chapter  IV,  the  expressions  for  entropy  and  gradient  are 
applied  to  specific  signal  types,  and  the  results  from  simulations  of  the  Infomax  method 
using  MATLAB  are  given. 
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IV.  MODELING  AND  SIMULATION 


Chapter  III  was  dedicated  to  obtaining  expressions  for  entropy  and  its  gradient. 

1  M  N 

The  entropy  h( Y) ,  as  shown  in  Equation  (63),  is  h( Y)  =  —  ZZlnAXE)  +  Hw|.  The 

N  i= i  t=\ 

1  n  j 

gradient  Vh  was  shown  in  Equation  (92)  to  be  Vh  =  W~r  +  —  ^^/(y,)[x'  J  ,  where 

'F(y')  is  g  "(y)/g'(y)  •  The  change  in  entropy  is  maximized  using  the  gradient  ascent 
algorithm  in  Equation  (93),  Wnew  =  Wold  +  rjVh. 


It  is  clear  from  the  equations  above  that  both  the  entropy  and  its  gradient  are 
dependent  upon  a  model  pdf  of  the  source  signals,  ps(y)  or  g'( y),  and  it  should  be 
recalled  that  the  extracted  signals  are  transformed  through  their  model  cdf  Y  =  g(y) . 
Stone  refers  to  this  as  the  cdf/pdf-matching  property  of  Infomax,  and  it  is  through  the 
careful  selection  of  a  model  cdf  and  pdf  that  the  Infomax  method  is  tuned  towards  the 
desired  type  of  extracted  signal.  If  one  desires  extracted  speech  (or  audio)  signals,  as  in 
the  cocktail  party  example  of  Chapter  I,  a  model  cdf  and  pdf  that  resembles  audio  signals 
should  be  used  in  the  Infomax  algorithm  [12]. 


A.  HIGH-KURTOSIS  SIGNALS 

Kurtosis  A  is  a  measure  of  peakedness  of  a  pdf  and  is  defined  as 

E{x4} 

<95) 

where  E  j x4  j  is  the  fourth  central  moment  and  cjx2  j  is  the  second  central  moment. 

Gaussian  signals  have  zero  kurtosis.  When  kurtosis  is  negative,  a  signal  is  sub-Gaussian. 
A  super-Gaussian  signal  has  positive  kurtosis  and  is  referred  to  as  a  high-kurtosis  signal 
[12]. 

Figure  5  depicts  a  sample  audio  signal  of  the  first  few  bars  of  the  “Hallelujah 
Chorus”  from  Handel’s  Messiah,  from  MATLAB’s  “Datafun”  directory. 


29 


Figure  5  Sample  audio  signal. 


As  audio  signals  spend  the  majority  of  their  time  around  zero,  they  typically  have  super- 
Gaussian  pdfs.  Therefore,  Infomax  uses  a  super-Gaussian  pdf  to  model  high-kurtosis 
signals  such  as  audio  signals  [12]. 

1.  Adapting  the  Infomax  Algorithm 

A  typical  cdf  used  to  model  high-kurtosis  signals  is  the  hyperbolic  tangent, 

plotted  in  Figure  6.  While  this  is  not  a  valid  cdf  because  it  contains  negative  values,  it 

was  used  successfully  in  [12],  and  was  also  used  in  these  simulations.  Simulations  were 
also  run  with  a  shifted  and  scaled  version  of  the  hyperbolic  tangent  as  a  model  cdf,  and  a 
shifted  and  scaled  version  of  its  derivative  as  the  model  pdf.  These  were  more  accurate 
representations  of  the  cdf  and  because  they  were  valid  models:  the  cdf  was  positive  and 
ranged  from  [0,1],  and  the  pdf  had  an  area  of  1.  The  results  of  the  scaled  and  shifted 

hyperbolic  tangent  were  consistent  with  the  results  using  the  true  hyperbolic  tangent 
function.  The  original  version  of  the  hyperbolic  tangent  was  used  for  ease  of  comparing 
results  with  the  work  done  in  [12]. 
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Hence, 


Y  =  g(  y)  =  tanh(y)  (96) 

Since  the  cdf  g(y)  was  chosen  to  be  the  hyperbolic  tangent  in  Equation  (96),  the 
derivative  of  this  is  g'(  y) ,  or  the  pdf,  which  is  illustrated  in  Figure  7  [1 1].  This  pdf  is  not 
ideal  because  its  area  is  2  ,  but  was  also  used  successfully  in  [12].  Hence, 


g  '(y)  =  -7-  tanh(y) =  sech2  (y )  = 1  - tanh2  (y )  (97) 

dy 


Figure  7 


Hyperbolic  tangent  derivative. 
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Adapting  Equation  (63)  for  a  high-kurtosis  signal,  the  expression  for  entropy  becomes 

i  M  N 

K *)  =  W  Z  Z ln  ( 1  “ tanh2  ^  )  +  ln  I Wl  •  (98) 

A  i= 1  t= 1 

The  gradient  of  entropy  V/z  contains  the  ratio  'F(y)  ,  where  'F(y)  =  g”(y)/g'(y)  ■ 
While  the  pdf  g  '(y)  was  given  in  Equation  (97),  the  derivative  of  the  pdf,  g  "(y)  is  also 
necessary  and  is 

g  "(y)  =  -  j- g  '(y)  =  ( 1  “ tanh2  (y))  =  ~^~( tanh2  ( y ')) =  ~2  tanh(y)  -7-  tanh(y)  (99) 

ay  ay v  ay  ’  ay 

Since  <7/<iv(tanh(y))  is  equal  to  g  '(y) ,  Equation  (99)  becomes 

-2  tanh(y)g'(y),  (100) 

and  the  resulting  expression  for  the  derivative  of  the  model  pdf  is 

g  "(y)  =  -2  tanh(y)g  '(y) .  (101) 

The  ratio  'E(y)  in  the  gradient  V/z  is 

^(y)  =  =  2  lanh(y|g  (y)  =  -2  tanh(y) .  (102) 

g(y)  g(y) 

Adapting  Equation  (92)  for  a  high-kurtosis  signal,  we  get  the  following  gradient  Vh 

i  n  T 

V/z  =  W~r  H - Z -2  tanh(y ' ) [ x'  ]  .  (103) 

N  t=\ 

Expressions  for  high-kurtosis  implementation  of  Infomax  have  been  obtained,  so 
now  a  computing  tool  is  necessary  to  perform  the  algorithm  effectively  and  efficiently. 

2.  Implementation  in  MATLAB 

Appendix  D  in  [12]  shows  sample  code  for  the  implementation  of  the  Infomax 
algorithm  for  high-kurtosis  signals  using  gradient  ascent.  The  code  was  adapted  for 
easier  entry  of  variables,  which  allowed  inputs  to  be  changed  and  simulations  to  be  run 
and  re-run  with  greater  ease.  With  Stone’s  default  values  for  step  size  77  (rj  =  0.25  )  and 
maximum  number  of  iterations  (100),  the  algorithm  converged  approximately  half  the 
time.  It  was  clear  that  an  improved  gradient  ascent  algorithm  was  necessary  for  more 
consistent  performance,  and  some  additional  complexity  in  the  code  was  accepted  to 
improve  the  reliability  of  the  results. 
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a.  Improving  the  Gradient  Ascent  Algorithm  for  Faster 
Convergence 

The  goal  of  improving  the  gradient  ascent  algorithm  Wnew  =  Wold  +  rjS/h 

was  not  to  find  the  optimal  algorithm  but  instead  to  increase  the  rate  of  convergence  as 
well  as  the  accuracy  of  the  algorithm  results.  The  unmixing  matrix  W  is  initially  set  to 
the  identity  matrix.  However,  rather  than  taking  the  same  size  step  in  the  direction  of 
maximum  increase,  larger  steps  are  taken  as  long  as  entropy  h( Y)  continues  to  increase. 
If  entropy  decreases,  it  is  assumed  that  the  algorithm  missed  the  maximum.  Rather  than 
continuing  to  take  steps,  the  algorithm  regresses  to  the  last  “good”  values  of  W  and 
h(\)  and  begins  taking  smaller  steps,  which  gradually  increase  again  as  long  as  entropy 
increases.  A  flow  chart  of  the  improved  gradient  ascent  algorithm  is  shown  in  Figure  8. 

b.  Improving  Maximization  by  Varying  Initial  W 

Gradient  ascent  finds  the  nearest  local  maximum  of  a  function,  which  is 
not  necessarily  the  maximum  function  value.  Since  Infomax  finds  extracted  signals  by 
maximizing  entropy,  it  is  necessary  for  the  global  maximum  to  be  found  for  the  algorithm 
to  properly  converge.  A  higher  probability  of  locating  a  global  maximum  was  obtained 
by  creating  a  function  of  the  improved  gradient  ascent  algorithm  so  that  gradient  ascent 
could  be  repeated  multiple  times  with  varying  starting  points.  The  gradient  ascent 
function  encompasses  the  process  shown  in  Figure  8.  The  Infomax  MATLAB  code  was 
adapted  so  that  the  gradient  ascent  function  could  be  repeated  multiple  times.  The  first 
repetition  uses  the  identity  matrix  for  the  initial  value  of  W ,  and  subsequent  repetitions 
randomly  select  an  unmixing  matrix  W ,  which  is  then  multiplied  by  some  small  step 
constant  and  the  repetition  number.  The  adapted  MATLAB  code  for  high-kurtosis 
signals  is  included  in  Appendix  D,  and  the  gradient  ascent  function  is  included  in  the 
Appendix  E. 
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Figure  8  Improved  gradient  ascent  algorithm  flow  chart. 
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3. 


Results  and  Conclusions 


The  modified  MATLAB  code  was  run  for  M  signals  with  M  =  2,  3,  and  6 . 
Adding  to  the  complexity  of  the  code  increased  the  run  time  but  also  resulted  in  more 
reliable  convergence  (approximately  80%  of  simulations  with  M  =  2  converged  in  35 
iterations  or  fewer). 


a.  Results  for  M  =  2  Source  Signals 

High-kurtosis  signals  were  imported  from  MATLAB ’s  “data  fun” 
directory  and  are  shown  as  Signal  1  (a  bird  chirp)  and  Signal  2  (a  gong)  in  the  first  row  of 
Figure  9.  The  random  mixtures  of  the  two  source  signals  are  shown  in  the  second  row  of 
Figure  9.  The  Infomax  algorithm  was  run  with  the  values  shown  in  Table  1. 


Table  1  Algorithm  variable  values  for  M  =  2  . 


Infomax  algorithm  iterations 

100 

Initial  step  size  77 

0.1 

Step  size  increase  factor  a 

1.2 

Step  size  decrease  factor  /? 

0.1 

Gradient  Ascent  repetitions 

5 

The  values  of  these  variables  were  not  analyzed  to  be  optimal,  but  produced  consistent 
results.  The  signals  extracted  by  the  Infomax  algorithm  are  shown  in  row  three  of  Figure 
9.  The  similarity  of  the  extracted  signals  to  the  original  source  signals  is  evident.  As 
Infomax  does  not  preserve  the  ordering  of  the  signals,  the  results  would  sometimes  be 
reversed  (as  shown),  with  the  gong  extracted  as  “Extracted  Signal  1”  and  the  chirp 
extracted  as  “Extracted  Signal  2.”  Additionally,  Infomax  does  not  necessarily  preserve 
the  sign  of  a  signal  with  a  pdf  that  is  even  about  zero,  although  this  is  not  evident  in 
Figure  9. 
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Figure  9  Infomax  results  for  M  =  2  . 


Figure  10  shows  gradient  ascent  function  values  for  entropy  h{ Y), 
gradient  of  entropy  Vh,  and  step  size  // .  In  Figure  10,  results  converged  after 
approximately  15  iterations,  although,  generally,  the  results  converged  after  35  to  50 
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iterations.  The  middle  plot  depicts  \Vh\ ,  the  magnitude  of  the  entropy  gradient.  As 

entropy  is  maximized,  the  magnitude  of  the  gradient  decreases,  indicating  the  maximum 
is  very  near  the  current  value.  The  bottom  plot  shows  how  the  step  size  //  changes  with 
the  “learning”  gradient  ascent  algorithm. 


Function  Values  -  Entropy 


Figure  10  Entropy  h( Y),  gradient  V/z ,  and  rj  values  for  M  =  2  . 
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b.  Results  for  M  =  3  Source  Signals 

The  addition  of  a  third  signal  from  the  MATLAB  “Data  fun”  directory 
(the  “splat”  sound  of  spilling  paint)  and  third  signal  mixture  to  the  Infomax  algorithm 
slightly  increases  the  complexity  and  computation  time.  The  values  used  in  the  algorithm 
are  shown  in  Table  2. 


Table  2  Algorithm  variable  values  for  M  =  3  . 


Infomax  algorithm  iterations 

100 

Initial  step  size  77 

0.1 

Step  size  increase  factor  a 

1.2 

Step  size  decrease  factor  f 

0.1 

Gradient  Ascent  repetitions 

5 

With  the  same  number  of  iterations  and  the  same  number  of  gradient 
ascent  repetitions  performed,  the  results  were  fairly  consistent  with  those  obtained  in  part 
a,  although  they  converged  less  often  (approximately  50%  of  runs  converged).  The 
results  converged  more  frequently  when  the  number  of  gradient  ascent  repetitions 
increased,  but  that  also  increased  the  computation  time.  The  results  of  a  sample  run  of  a 
“chirp,”  a  “gong,”  and  a  “splat”  are  shown  in  Figure  11.  As  in  the  two-signal  case  of  part 
a,  the  extracted  signals  are  clearly  close  matches  for  the  source  signals.  Also,  although 
Figure  1 1  suggests  otherwise,  in  general  signal  ordering  is  not  preserved  [12]. 
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Figure  1 1  Infomax  results  for  M  =  3 . 
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c. 


Results  for  M  =  6  Source  Signals 


The  Infomax  method  was  tested  on  six  source  signals  and  mixtures  from 
MATLAB’s  “Data  fun”  directory,  with  the  types  of  each  source  signal  shown  in  Table  3. 


Table  3  Signal  descriptions  for  M  =  6  source  signals. 


Signal  1 

Signal  2 

Signal  3 

Signal  4 

Signal  5 

Signal  6 

chirp 

gong 

splat 

laughter 

train  whistle 

music 

The  values  for  the  Infomax  iterations  and  gradient  ascent  variables  are 
shown  in  Table  4. 


Table  4  Algorithm  variable  values  for  M  =  6  . 


Infomax  algorithm  iterations 

100 

Initial  step  size  77 

0.1 

Step  size  increase  factor  a 

1.2 

Step  size  decrease  factor  /3 

0.1 

Gradient  Ascent  repetitions 

5,  10,  15,20,  25,50,  100 

In  the  six  source  signal  case,  multiple  runs  with  increasing  numbers  of 
gradient  ascent  iterations  were  attempted  in  order  to  find  a  reasonable  number  of 
iterations  where  the  function  converged  fairly  consistently.  For  five  iterations  of  the 
gradient  ascent  function,  it  was  clear  that  similarities  between  the  source  signals  and 
extracted  signals  existed,  and  the  entropy  value  was  maximized  as  shown  in  the  first  plot 
of  Figure  12.  However,  zero  of  ten  trials  with  five  gradient  ascent  repetitions  truly 
converged,  as  is  evident  from  a  sample  trial  shown  in  Figure  13.  In  fact,  even  100 
repetitions  of  the  gradient  ascent  algorithms  did  not  produce  consistently  convergent 

results,  although  entropy  was  still  maximized,  as  evident  from  the  top  plot  in  Figure  14. 
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The  source  and  extracted  signals  for  100  gradient  ascent  repetitions  are  shown  in  Figure 
15.  Although  the  results  are  slightly  better  than  the  five  repetition  case,  there  was  a 
significant  increase  in  computation  time  for  a  relatively  small  improvement  in  results. 


Function  Values  -  Entropy 


Figure  12  /?(Y) ,  V/z ,  and  //  values  for  M  =  6  for  5  gradient  ascent  repetitions. 
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Figure  13  Source  and  extracted  signals  for  five  gradient  ascent  repetitions. 
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Figure  14  h(\) ,  V/z ,  and  //  values  for  M  =  6  for  100  gradient  ascent  repetitions. 
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Figure  15  Source  and  Extracted  Signals  for  100  gradient  ascent  repetitions. 
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d.  Conclusions 

For  high-kurtosis  signals,  the  Infomax  algorithm  with  multiple  iterations 
of  a  “learning”  gradient  ascent  function  proved  quite  useful  in  extracting  small  numbers 
of  signals  from  signal  mixtures.  As  the  number  of  source  signals  increased,  the  algorithm 
was  less  able  to  reliably  extract  source  signals  quickly  and  consistently.  This  was  likely 
because  the  random  mixtures  of  an  increased  number  of  source  signals  s  resulted  in 
signal  mixtures  x  with  “bumpier”  entropy  functions;  i.e.  entropy  h( Y)  =  h( Wx)  with 
many  local  maximums.  The  increased  occurrence  of  local  maximums  required  more 
iterations  of  the  gradient  ascent  function  with  different  initial  values  for  the  unmixing 
matrix  W  to  find  a  starting  point  that  might  find  the  global  maximum  rather  than  a  local 
maximum,  since  the  Infomax  algorithm  only  converges  and  extracts  source  signals  by 
finding  the  global  maximum  of  entropy  h(\) . 

B.  SIMPLE  COMMUNICATIONS  SIGNALS 

Once  it  was  determined  that  the  Infomax  algorithm  was  fairly  reliable  for  small 
numbers  of  high-kurtosis  signals,  it  was  tested  on  a  simple  communications  signal,  the 
polar  non-return  to  zero  signal  [5]. 

1.  Adapting  the  Infomax  Algorithm  for  a  Polar  NRZ  Signal 

The  implementation  of  a  polar  NRZ  signal  into  the  Infomax  algorithm  first 
required  a  function  that  would  create  a  polar  NRZ  signal  in  MATLAB.  Code  for  this 
function  is  included  in  Appendix  F.  The  polar  NRZ  function  was  used  to  generate  two 
signals  with  randomly  selected  amplitudes,  bit  rates,  and  time  shifts,  and  these  signals 
were  randomly  mixed.  The  first  attempt  at  extracting  polar  NRZ  signals  used  the  high- 
kurtosis  model  cdf  g(y)  =  tanh(y)  and  model  pdf  g'(y)  =  l-tanh2(y)  .  As  expected,  the 
algorithm  was  unsuccessful  at  extracting  the  two  polar  NRZ  signals  after  multiple 
attempts,  and  it  was  necessary  to  model  the  cdf  and  pdf  of  a  polar  NRZ  signal. 

A  sample  polar  NRZ  signal  is  shown  in  Figure  16.  The  cdf  is  shown  in  Figure  17, 
and  its  associated  derivative  pdf  is  shown  in  Figure  18. 
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Figure  16  Sample  polar  NRZ  signal. 
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Figure  17  Theoretical  polar  NRZ  cdf. 


Polar  NRZ  Probability  Density  Function  (pdf) 
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Figure  18  Theoretical  polar  NRZ  pdf. 


The  equation  representing  the  cdf  g{y)  of  a  polar  NRZ  signal  as  shown  in  Figure  17  is 

g(  v)  =  0.5w(  y  + 1)  +  0.5«( y  - 1) ,  (1 04) 

46 


where  w(v)  is  the  unit-step  function.  The  corresponding  derivative  g '( v)  is 

g  '(y)  =  0.5S(  v  + 1)  +  0.5  S(y  - 1)  (105) 

The  theoretical  cdf  and  pdf  for  the  polar  NRZ  signal  create  a  dilemma  in  that  they  cannot 
be  implemented  in  MATLAB  due  to  the  delta  function  in  the  pdf.  Therefore,  it  was 
necessary  to  create  approximations  of  the  cdf  and  pdf  that  could  be  implemented  in 
MATLAB. 

2.  Creating  Model  Statistical  Functions 

As  Infomax  aims  to  extract  source  signals  based  on  a  model  cdf  and  pdf,  it  was 
necessary  to  create  functions  for  the  cdf  g(Y) ,  the  pdf  g  '(Y)  ,  and  the  derivative  of  the 
pdf  g"(Y)  that  could  be  implemented  in  the  Infomax  code  using  MATLAB.  This  was 
accomplished  by  two  separate  approaches.  The  first  of  these  approaches  modeled  the 
delta  functions  in  the  pdf  as  triangles  with  arbitrarily  narrow  bases.  The  second  approach 
used  a  modified  version  of  the  hyperbolic  tangent  function  to  more  closely  model  the  cdf 
and  pdf  of  a  polar  NRZ  signal. 

a.  The  Triangular  Model 

The  first  approach  involved  approximating  the  delta  function  in  the  pdf  of 
Figure  18  as  a  triangle  with  a  base  of  2 y ,  where  y  was  chosen  as  an  arbitrarily  small 
number,  as  shown  in  Figure  19. 


Approximate  Polar  NRZ  Probability  Density  Function  (pdf) 


Figure  19  Approximate  polar  NRZ  pdf. 
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The  approximate  pdf  and  cdf  are  derived  in  Appendix  C.  The  resultant  expression  for  the 
approximate  pdf  q  ’( y)  for  a  polar  NRZ  signal  is 

fo 


q\y)  = 


1  I  +  7 

- V  VH - y 

2  r2  2  y2 
-1  y-  1 

~2fyJr  ~2f 

0 

1  y  —  1 

2 y2 '  2 y1 
-1  l  +  r 

2 y2 '  2 72 

0 


v  <  -(I  +  7) 

-(1  +  7)  -  T  <  -1 

-1<7<7-1 
7~1 - T  < 1-7 
1-7  -  T  <  1 

1  <  y  <  1  +  7 
v  >  1  +  7 


(106) 


The  function  was  implemented  in  MATLAB  (code  for  “polarNRZpdf  ’  in  Appendix  H). 

The  approximate  cdf  q( y)  was  found  by  integrating  the  pdf  q '( v) ,  and  is 


q(y)  = 


0 


\m(y2  -(\  +  y)2)  +  b{y-(\-yj) 
^m(l-y2)  +  b(l  +  y) 


2 

0.25  + 

0.5 

q(y)  =  0-5+ 

q(y)  =  0.75  + 
1.0 


^m(y2  -  (1-/)2)  +  6(v-(1-7)) 
^m[l-y2)  +  b{y-\) 


y  <  -(I  +  7) 

(1  +  7)  -  T<  -1 

-1<7<7-1 
7-1-T  <1~7  (107) 

1_7- T  < 1 

1 <  y < I  +  7 
v  >  1  +  7 


The  function  was  implemented  in  MATLAB  (code  for  “polarNRZcdf  ’  in  Appendix  G). 
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The  derivative  of  the  pdf  q  "( v)  was  simple  to  construct  by  taking  the 


slope  values  from  the  appropriate  regions  of  the  pdf,  resulting  in 


q\y)  = 


0 


1 


-1 


2/ 

0 

1 


2r 

-l 


2r 

o 


y<-(l  +  r) 

-(\  +  y)<  y<-\ 

-\<y<y-\ 
y-\<y<\-y 
\-y< y<l 

1 <  y < 1  +  y 
y>l  +  y 


(108) 


Code  for  the  “polarNRZdpdf  ’  function  is  included  in  Appendix  I. 


The  MATLAB  implementations  of  the  approximate  model  cdf  q(y),  pdf 
q\y),  and  pdf  derivative  q "( v)  are  plotted  for  y  =  0.1  in  Figure  20.  The  approximate 
functions  q(y)  and  q'(y)  bear  a  striking  resemblance  to  the  theoretical  functions  g(y) 
and  g  ’(y)  shown  in  Figure  17  with  the  added  benefit  that  they  can  be  implemented  in 
MATLAB  so  that  the  Infomax  algorithm  can  be  adapted  for  the  polar  NRZ  signal. 
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Approximate  Polar  NRZ  cdf  q(y) 


Figure  20  Approximate  polar  NRZ  cdf  q(y) ,  pdf  q '( v) ,  and  q "( v)  for  y  =  0.1. 
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b.  The  Hyperbolic  Tangent  Model 

Since  the  long  range  of  zeroes  in  the  triangular  model  caused  divide  by 
zero  problems  in  the  VF  function  (Equation  (80))  when  implemented,  and  the  hyperbolic 
tangent  was  used  successfully  in  approximating  the  cdf  and  pdf  of  the  high-kurtosis 
signals  in  part  A,  modifications  were  made  to  adapt  the  theoretical  polar  NRZ  cdf  from 
Figure  17  and  pdf  from  Figure  18. 


A  second  approximate  polar  NRZ  cdf  6(y)  is  created  by  duplicating  and 
shifting  the  hyperbolic  tangent  function  so  as  to  more  closely  approximate  the  theoretical 
cdf  A  good  approximation  of  the  cdf  is  found  to  be 


^-tanh  (cr(  v  + 1))  + 1  y<  0 
1 


0{v)  =  ■ 

where  a  is  a  compression  factor  that  controls  the  slope  of  the  function. 


(109) 


—  tanh  (cr(  v  - 1))  +  3  y>  0 


The  approximate  pdf  O'(y)  is  found  by  taking  the  derivative  of  the  cdf 
6(y  ) ,  and  the  result  is 


0\y)  = 


■^-(l-tanh2  (cr(  v  +  l)))  y<  0 
(l-tanh2  (cr(  v-l)))  y>  0 


(110) 


An  expression  for  0"(y)  is  found  by  taking  the  derivative  of  the  pdf 


e '(>’)■■ 


e\y)  =  =  ZfZ(  1  -  tanh2  (<r(y±l))) 


dy 


dv{  4 


=  0-  — 
dy 


d  (  a  ^ 


—  tanh2(r7(y±l))J  (111) 


Using  the  chain  rule,  we  get 

6 =  ~[^T tanh ( ^ ± ^ ' j ( 1 1  _  tanh2  ( ±  1) ) ) 


a 


(112) 


From  Equation  (1 10),  0'(y)  =  —  (l-tanh2  (cr(y  ±1))),  so  Equation  (1 12)  simplifies  to 
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(113) 


0\y)  = 


~~  tanh  (<r(  v  ±  1))  6  \y) 


The  resulting  expression  for  the  approximate  pdf  derivative  6 "( v)  is 

-tanh(<j(y  +  \))0'(y)  y<  0 

-tanh(o-(v-l))6,’(v)  y>  0 


0"(y)  = 


-a 

~T 

_2 

—<T 


(114) 


l  2 


The  approximate  polar  NRZ  cdf  6(y)  and  pdf  6  \y)  using  the  hyperbolic 
tangent  function  are  shown  in  Figure  21.  Like  the  triangular  model,  they  closely 
resemble  the  theoretical  versions  in  Figure  17  and  Figure  18.  MATLAB  code  for  these 
functions  is  included  in  Appendices  J,  K,  and  L. 


Approximate  Polar  NRZ  cdf  e(y) 


Approximate  polar  NRZ  cdf  6{y)  and  pdf  0'(y)  with  cr  =  10  . 
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Figure  2 1 


3. 


Results  and  Conclusions 


Numerous  simulations  were  run  using  the  triangle  model  to  approximate  polar 
NRZ  statistical  functions  with  y  =  0. 1 .  These  simulations  were  repeated  using  the 
hyperbolic  tangent  model  with  cr  =  10  so  as  to  closely  approximate  the  theoretical  cdf 
and  pdf  of  the  polar  NRZ  signal.  The  long  ranges  of  zero  for  q  ’(y)  and  q  "(y)  produced 
with  y  =  0.1,  as  shown  in  Figure  20,  and  the  long  ranges  of  near-zero  values  for  6'(y) 
caused  the  Infomax  algorithm  to  behave  poorly.  As  T'(y)  is  the  second  derivate  divided 
by  the  first  derivative,  and  the  first  derivative  contained  long  ranges  of  zero,  the  divide  by 
zero  produced  erroneous  results.  To  eliminate  the  long  ranges  of  zero  values,  trials  were 
repeated  using  the  triangle  model  with  y  =  1.0  (Figure  22)  and  the  hyperbolic  tangent 
model  with  cr  =  2  (Figure  23). 

Setting  y  equal  to  1.0  removed  the  long  stretches  of  zeroes  in  the  center  of  the 
pdf  q\y )  and  its  derivative  q"(y) ,  as  shown  in  Figure  22.  This  proved  advantageous  in 
that  simulations  with  M  =  2  signals  were  successful  over  fifty  percent  of  the  time. 
However,  also  evident  from  Figure  22  is  that  the  resulting  cdf  q(y)  and  pdf  q\y)  are 
much  poorer  approximations  of  the  ideal  cdf  g(y)  and  pdf  g  \y)  .  Similarly,  using  the 
compression  factor  cr  =  2  in  the  hyperbolic  tangent  model  eliminated  the  long  ranges  of 
near-zero  values,  as  is  evident  from  Figure  23.  This  improved  the  reliability  of  the 
Infomax  algorithm.  Again,  the  resulting  cdf  <9(y)  and  pdf  0  ’(y)  are  poorer 
approximations  of  the  theoretical  cdf  and  pdf  of  the  polar  NRZ  signal. 

In  both  cases,  the  less  ideal  approximation  of  the  polar  NRZ  cdf  and  pdf  produced 
the  best  results  for  the  Infomax  algorithm.  However,  all  results  in  this  section  were 
obtained  using  the  hyperbolic  tangent  model  with  cr  =  2  because  the  hyperbolic  tangent 
model  provided  a  more  continuous  pdf  which  was  differentiable  at  more  points  than  the 
triangular  model  pdf.  Therefore,  the  hyperbolic  tangent  model’s  approximation  of  the 
polar  NRZ  cdf  and  pdf  produced  more  consistent  results  which  converged  faster,  and 
with  a  greater  number  of  signals  and  signal  amplitudes,  than  the  triangular  model 
approximation. 
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Approximate  Polar  NRZ  cdf  q(y) 
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Figure  22  Approximate  polar  NRZ  cdf  q(y) ,  pdf  q '( y) ,  and  q "( v)  for  y  =  1 .0  . 
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Approximate  Polar  NRZ  cdf  e(y) 


Figure  23  Approximate  polar  NRZ  cdf  6{y )  ,  pdf  6\y) ,  and  6"(y)  with  cr  =  10  . 
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a.  Results  for  M  =  2  Source  Signals 

The  Infomax  algorithm  was  run  with  two  source  signals  of  randomly 
selected  amplitude,  bit  rate,  and  time  shift  using  the  values  shown  in  Table  5,  and 
converged  in  70%  of  trials. 


Table  5  Algorithm  variable  values  for  M  =  2  . 


Infomax  algorithm  iterations 

100 

Initial  step  size  77 

0.05 

Step  size  increase  factor  a 

1.2 

Step  size  decrease  factor  (3 

0.1 

Gradient  Ascent  repetitions 

50 

Figure  24  shows  the  results  of  a  sample  trial,  where  the  top  row  shows  the 
randomly  generated  source  signals,  the  middle  row  shows  the  random  mixtures  of  the 
signals,  and  the  bottom  row  shows  the  extracted  signals.  It  is  clear  from  Figure  24  that 
Extracted  Signal  1  shares  the  same  bit  pattern  as  Source  Signal  2,  but  with  a  different 
magnitude,  and  the  case  is  the  same  with  Source  Signal  1  and  Extracted  Signal  2.  In  fact, 
every  single  successful  run  of  the  Infomax  algorithm  extracted  signals  with  a  magnitude 
of  approximately  one.  This  result  is  due  to  the  tendency  of  the  Infomax  algorithm  to 
match  the  pdf  of  the  extracted  signal  to  the  model  pdf  supplied  to  the  algorithm.  Since 
the  model  pdf  was  developed  around  a  polar  NRZ  signal  with  an  amplitude  of  1.0,  the 
extracted  signal  also  had  an  amplitude  of  1.0.  In  addition  to  not  preserving  the 
magnitude  of  the  original  signal,  repeated  trials  showed  the  algorithm  did  not  necessarily 
preserve  the  ordering  or  the  phase  (+/-)  of  the  extracted  signals,  as  was  the  case  with 
the  high-kurtosis  signals. 
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Signal  1 


Signal  Mixture  1 


Extracted  Signal  1 


Signal  2 


Signal  Mixture  2 


Extracted  Signal  2 


Figure  24  Source  and  extracted  signals  for  M  =  2  polar  NRZ  signals. 
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b.  Results  for  M  =  3  Source  Signals 

The  Infomax  algorithm  was  repeated  for  three  polar  NRZ  source  signals  of 
the  same  amplitude  and  a  randomly  selected  bit  rate  and  time  shift,  as  well  as  three 
source  signals  with  randomly  selected  amplitude,  bit  rate,  and  time  shift  using  the  values 
shown  in  Table  6. 


Table  6  Algorithm  variable  values  for  M  =  3  . 


Infomax  algorithm  iterations 

100 

Initial  step  size  77 

0.01 

Step  size  increase  factor  <2 

1.2 

Step  size  decrease  factor  f 

0.1 

Gradient  Ascent  repetitions 

100 

Trials  with  M  =  3  required  more  iterations  of  the  gradient  ascent 
algorithm  and  converged  less  frequently  than  the  two-signal  case.  While  the  Infomax 
algorithm  was  successful  in  extracting  two  source  signals  often  (70%  of  trials),  it  only 
truly  converged  to  extract  all  three  source  signals  in  5%  of  trials.  The  result  of  a 
converging  trial  is  shown  in  Figure  25.  From  this  figure,  it  is  clear  that  Extracted  Signal 
1  is  Source  Signal  2  with  amplitude  of  one.  Source  Signal  1  is  extracted  as  Extracted 
Signal  2,  and  Source  Signal  3  is  Extracted  Signal  3.  As  was  the  case  for  M  =  2  polar 
NRZ  source  signals,  successful  trials  do  not  preserve  the  magnitude  of  the  original  source 
signals,  although  Infomax  was  equally  capable  of  distinguishing  between  signals  with  the 
same  amplitude  (not  shown)  and  signals  with  randomly  selected  amplitude.  Additionally, 
the  algorithm  does  not  always  preserve  signal  order  or  phase. 
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Signal  1 


Signal  Mixture  1 


Extracted  Signal  1 


Signal  2  Signal  3 


Signal  Mixture  2  Signal  Mixture  3 


Extracted  Signal  2  Extracted  Signal  3 


Figure  25 


Source  and  extracted  signals  for  M  =  3  polar  NRZ  signals. 
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c.  Results  for  M  =  6  Source  Signals 

The  Infomax  algorithm  was  again  repeated  for  six  polar  NRZ  source 
signals  of  randomly  selected  amplitude,  bit  rate,  and  time  shift  using  the  values  shown  in 
Table  7. 


Table  7  Algorithm  variable  values  for  M  =  6  . 


Infomax  algorithm  iterations 

100 

Initial  step  size  rj 

0.01 

Step  size  increase  factor  a 

1.2 

Step  size  decrease  factor  /? 

0.1 

Gradient  Ascent  repetitions 

300 

For  the  given  values,  no  signals  of  significance  were  extracted.  Successful 
implementation  for  a  reasonably  fast  convergence  of  the  algorithm  for  greater  than  three 
source  signals  either  requires  a  more  advanced  gradient  ascent  algorithm  or  a  more 
exhaustive  search  for  the  optimal  unmixing  matrix  W,  resulting  in  a  significantly 
increased  simulation  time. 

d.  Conclusions 

For  polar  NRZ  signals,  the  Infomax  algorithm  as  implemented  in 
Appendix  M  was  consistently  successful  in  extracting  two  signal  patterns  from  two  signal 
mixtures;  however,  it  does  not  preserve  signal  magnitude,  ordering,  or  phase  (+/-). 
Magnitude  is  not  preserved  as  the  algorithm  matched  the  signal  mixtures  to  the  model 
pdf.  Signal  order  is  not  preserved  because  the  order  was  merely  convention,  and  there  is 
no  way  for  the  algorithm  to  determine  that  convention  from  the  signal  mixtures.  Phase, 
or  sign  (+/-),  is  not  preserved  due  to  the  symmetry  of  the  pdf  around  zero. 
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When  the  number  of  source  signals  grows  to  M  >2  ,  the  algorithm  is  not 
reliable  in  extracting  M  source  signals  quickly  and  consistently.  Similar  to  the  high 
kurtosis  case,  this  is  likely  because  the  random  mixtures  of  an  increased  number  of 
source  signals  s  result  in  signal  mixtures  x  with  a  “bumpier”  entropy  function 
h( Y)  =  h( Wx)  with  many  local  maximums.  The  increased  occurrence  of  local 
maximums  requires  more  repetitions  of  the  gradient  ascent  function  with  different  initial 
values  for  the  unmixing  matrix  W  to  find  a  starting  point  that  finds  the  global  maximum 
rather  than  a  local  maximum  of  entropy  h(  Y) . 

This  chapter  modeled  the  Infomax  algorithm  for  both  high-kurtosis  and 
polar  NRZ  signals  and  presented  the  results  and  conclusions  of  MATLAB  simulations  for 
each  individual  case.  Chapter  V  presents  the  broad  conclusions  of  the  research  as  well  as 
opportunities  for  future  research  in  this  subject  area. 
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y. 


CONCLUSIONS  AND  FUTURE  WORK 


A.  CONCLUSIONS 

Independent  Component  Analysis  using  the  Infomax  method  of  maximizing 
entropy  proved  to  be  a  feasible  solution  to  the  blind  source  separation  problem  for  small 
numbers  of  signals.  The  assumptions  made  in  Chapter  III.C.2,  although  not  necessarily 
accurate,  were  very  useful  in  that  they  allowed  signals  to  be  extracted  successfully.  The 
gradient  ascent  algorithm  described  in  Chapter  III.D  was  sufficient  to  obtain  solid  results 
for  small  numbers  of  signal  mixtures  (M  <  3) ,  although  greater  numbers  of  mixtures 
required  more  calculation  time  for  convergence.  This  result  was  due  to  the  ability  of  the 
gradient  ascent  algorithm  to  locate  only  the  nearest  local  maximum,  while  the  Infomax 
algorithm  seeks  the  location  of  the  global  maximum  of  entropy. 

By  choosing  a  pdf  that  matched  the  desired  signals  to  be  extracted,  the  Infomax 
algorithm  was  effectively  tailored  to  both  a  typical  high-kurtosis  audio  signal  mixture  and 
a  simple  communications  polar  NRZ  signal  mixture.  In  both  cases,  Infomax  was  found 
to  be  a  speedy  and  reliable  method  of  extracting  a  small  number  of  signals  from  a  small 
number  of  signal  mixtures.  Complexity  of  the  calculations  increased  with  increased 
number  of  source  signals  for  both  types  of  signals.  While  the  added  complexity  of 
additional  signals  makes  the  problem  difficult,  it  is  not  impossible  to  adapt  the  Infomax 
algorithm  for  larger  quantities  of  signals.  Greater  numbers  of  mixtures  would  require 
additional  improvements  to  the  gradient  ascent  algorithm  or  more  calculation  time  in 
order  to  obtain  meaningful  results. 

Overall,  the  Infomax  method  of  ICA  implemented  with  multiple  iterations  of  a 
gradient  ascent  algorithm  as  implemented  in  this  thesis  was  quite  successful  for 
extracting  a  small  numbers  of  the  signals  out  of  the  same  number  of  mixtures.  There  are, 
however,  several  opportunities  for  extended  research  on  this  topic. 
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B.  FUTURE  WORK 

While  ICA  is  still  a  relatively  new  research  field,  its  popularity  is  rapidly 
increasing,  and  opportunities  for  future  work  in  this  subject  matter  are  abundant. 
Research  opportunities  range  from  extensions  of  this  research  to  exploring  other  ICA 
methods. 

1.  Extensions  of  “Independent  Component  Analysis  by  Entropy 

Maximization  (Infomax)” 

a.  Signals  with  Identical  Bit  Rates 

While  Chapter  IV. B  considered  polar  NRZ  signals  with  both  identical  and 
random  amplitudes,  only  randomly  selected  bit  rates  were  analyzed.  Further  work  could 
be  done  to  detennine  the  ability  of  the  Infomax  algorithm  to  distinguish  between  source 
signals  with  identical  bit  rates. 

b.  Gradient  Ascent  Optimization 

The  Infomax  algorithm  was  found  to  converge  consistently  only  for  small 
numbers  (M  <  3)  of  source  signals  and  signal  mixtures.  With  M  =  6  source  signals  and 
mixtures,  some  similarities  between  the  source  and  extracted  signals  were  found  for 
audio  signals,  but  little  similarity  was  found  between  source  and  extracted  polar  NRZ 
signals.  Further  investigation  of  an  optimal  gradient  ascent  algorithm  would  likely 
improve  the  efficiency  of  the  Infomax  algorithm  as  well  as  extend  its  capability  beyond 
just  a  few  signal  mixtures. 

2.  Additional  Signals 

While  this  research  focused  on  audio  signals  and  the  polar  NRZ  communications 
signal,  Infomax  could  be  applied  to  a  variety  of  other  communications  signals. 
Additional  communication  signal  types  such  as  signals  at  intermediate  frequency  (IF)  or 
radio  frequency  (RF)  with  different  probability  density  functions  should  be  considered  as 
well  as  wireless  local  area  network  (WLAN),  cellular  (especially  CDMA),  and 
frequency-hopped  signals. 
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3. 


Other  ICA  Methods 


Infomax  is  just  one  of  many  methods  of  ICA.  Numerous  other  methods  could  be 
developed,  tested,  and  refined  to  determine  the  relative  applicability  and  efficiency  with 
various  signals.  These  methods  include  sphering  or  whitening  (separating  the  signals 
from  additive  white  Gaussian  noise)  to  the  more  advanced  methods  described  in  Chapter 
II.  All  of  these  possibilities  for  additional  research  paint  an  exciting  future  for 
breakthroughs  in  ICA  as  a  method  of  Blind  Source  Separation. 
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APPENDIX  A.  LIST  OF  VARIABLES 


A  arbitrary  event 

E  {X}  expected  value  of  X 

g(y)  model  cdf  through  which  y  is  transfonned 

g  '(y)  pdf  of  source  signal,  also  expressed  as  ps( y) 

g  "(y)  derivative  of  pdf  g  '(y) 

H(A )  entropy  of  event  A 

H(  Y)  entropy  of  transformed  signal  Y 

h(Y)  relative  entropy  of  transformed  signal  Y 

V/?  gradient  of  entropy 

1(A)  information  associated  with  event  A 

J  Jacobian  matrix 

J  determinant  of  Jacobian  matrix 

M  number  of  source  signals 

N  number  of  time  elements 

ps( y)  pdf  of  source  signal,  also  expressed  as  g  '(y) 

pY( Y)  pdf  of  the  mapped  signal  Y  =  g(y) 

Py  (y )  pdf  of  extracted  signal  y 

q( y)  triangular  model  approximation  of  polar  NRZ  cdf 
q  '(y)  triangular  model  approximation  of  polar  NRZ  pdf 
q  "(y)  triangular  model  approximation  of  polar  NRZ  pdf  derivative 
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t  time  index 

W  (M  x  M)  unmixing  matrix 

Wj  single  element  of  ith  row  and  jrh  column  of  W 
x  (M  x  N )  matrix  of  signal  mixtures 

x\  signal  i  of  x  at  time  t 
Y  =  g(y)  ,  transfonned  signal  matrix 

y  (M  x  N)  matrix  of  extracted  signals,  y  =  g  1  ( Y ) 
y\  signal  i  of  y  at  time  t 

a  small  constant,  increased  gradient  ascent  step  size 

P  small  constant,  decreased  gradient  ascent  step  size 

y  small  constant,  base  of  triangular  model  for  polar  NRZ  pdf  approximation 

r]  small  constant,  initial  gradient  ascent  step  size 

cr  small  constant,  compression  factor  for  polar  NRZ  cdf  approx  (tanh  model) 
'F(y)  ratio  g\y)/g\y) 

9{ y)  modified  tanh  model  approximation  of  polar  NRZ  cdf 
0  ’(y)  modified  tanh  model  approximation  of  polar  NRZ  pdf 
9  "(y)  modified  tanh  model  approximation  of  polar  NRZ  pdf  derivative 
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APPENDIX  B.  MAXIMUM  ENTROPY  YIELDS  INDEPENDENT 

SIGNALS  (PROOF) 


The  discrete  version  of  this  proof  is  shown  in  [3]  and  is  derived  in  two  parts.  The 
continuous  version  presented  in  this  Appendix  follows  [3]  closely.  The  first  part  of  this 
proof  shows  that  entropy  is  additive  for  independent  random  events,  while  the  second 
part  utilizes  the  inequality  In  x  >  1  -  ( 1/x)  to  prove  that  entropy  of  dependent  random 

events  is  always  less  than  entropy  of  independent  events.  Since  independent  events  have 
greater  entropy  than  dependent  events,  the  maximization  of  entropy  results  in 
independent  events.  When  applied  to  the  Infomax  principle,  maximizing  entropy  through 
gradient  ascent  results  in  independent  signals. 

A.  ENTROPY  IS  ADDITIVE  FOR  INDEPENDENT  RANDOM  EVENTS 

The  independent  random  events  X  =  (X \,X2,...,Xn)  have  a  joint  pdf 
P x  (x)  =  Px,  (xi  )Px2  (x2 ) ■  ■  ■  Px„  <X  )  •  The  entropy  of  A,  is 

oo 

//(Al)  =  £'{-ln/;V|(xl)}  =  -J  px  (xl)\npx  (xl)dxl .  (115) 

-oo 

The  joint  entropy  7/(X)  =  E  |-ln/?x(x)}  is 

oo  oo 

//(X)=-J-"J  PxS^)---  Px„  (Xn  )  ln  [Px,  (*1 ) ■  ■  ■  Pxn  (Xn  )]  dxV  dxn  •  (  1  1 6) 

-00  -00 

From  the  properties  of  logarithms, 

In  [ PXl  ( A  )  •  •  •  pXn  (xn )]  =  In  pXi  (xl  )  +  •••  +  ln  pK  (xn )  (117) 

and  entropy  H (X)  can  be  rewritten  as 

oo  oo 

H  (x)  =  -  J  "  •  J  Pxt  (X1 )  •  •  •  Px„  (xn )  [ln  Px,  (Xi  )  +  •••  +  In  pXn  (xn ) ]  c/X|  ■  •  •  dx„ .  (118) 

-00  -00 

This  expression  for  entropy  can  again  be  rearranged,  yielding 

oo  oo 

H(X)  =  ~  J  Px,  (fi )  In  pXi  (x,  )cLx,  +••■+{  /7v„  (x„ ) ln  Pxri  (x„  )dx„  ■  (U9) 
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From  Equation  (115),  H  =  -  J  px>  (Xj)ln  px>  (x^dx^ ,  so  Equation  (119)  can  again  be 

-oo 

rewritten  as 

H(X)  =  H(Xl)  +  --  +  H(Xn),  (120) 

which  proves  entropy  is  additive  for  independent  random  events. 


B.  ENTROPY  IS  LESS  FOR  DEPENDENT  RANDOM  EVENTS  THAN 
INDEPENDENT  RANDOM  EVENTS 

Equation  (120)  is  true  for  any  number  of  independent  random  events  and  can  be 
expressed  as  the  summation 


H(Xt)  +  -  +  H(X„)  =  '£H(Xk). 

k= 1 

n 

The  summation  Y  II  ( X k )  can  also  be  expressed  as 


(121) 


k= 1 


k= 1  k= 1 


J  Pxt(xk)]nPxk  (xk)dxk 


(122) 


or 


(123) 


(124) 


n  ~  n 

=  Px(Xl’---’  xn)^j  111  Pxk  (xk)dx j  •  •  •  dxn . 

k=\  _oo  —oo  k= 1 

Using  the  properties  of  logarithms,  we  can  rewrite  Equation  (123)  as 

n  00  00  n 

Y,  H  ( ■ Xk  )  =  -  J  '  • '  J  Px  (*1  >  •  •  • » ■ Xn  )  In  El  Pxk  iXk  )^r  • '  dXn  • 

k—\  —oo  —oo  k— 1 

The  second  part  of  this  proof  assumes  X  =  (Xl,X2,...,Xn)  are  not  the 
independent  random  events  of  part  one  but  rather  dependent  random  events.  As  X  is  no 
longer  independent,  the  pdfs  do  not  multiply  as  in  part  one.  Instead,  entropy  H  (X)  is 

oo  oo 

H(x)  =  H(Xi  =  -J  /U  -  U) ' ' '  Px„  (x„ )  \nPx  Up  •  •  • ,  U  )^i  ■'■dx„.(\  25) 


-oo  -oo 


To  prove  that  entropy  is  less  for  dependent  random  events  than  independent 
random  events,  Equation  (125)  is  subtracted  from  Equation  (124)  to  get 
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(126) 


Y,h{x,)-h(x„...,x,)  = 

k=\ 

00  00  n 

=  -  J  •  "  J  Px  0l  >  •  ■  ■  ’Xn  )  ' ^  n  Pxt  (XK  )dxi  •  •  ■  dxn 

—00  —00  k  =  \ 

00  00 

+  J  -"J  Px(x],...,Xn)---pXn(xn)\npx(x],...,xn)dxr--dxn 

-oo  -oo 

00  00 

Factoring  out  J  -"J  px(xx,...,xn) ,  we  get 


=  |  ’J  Px(XV'’Xn ) 


-00  -00 


In 


flM**) 

V  *=i 


+  ln(/jx(x1,...,x„)) 


t/x,  •  •  ■  dxn . 


Since  ln(a)  +  ln(h)  =  In (ab) ,  Equation  (128)  can  be  rewritten  as 

r  \ 


00  00 


=  J  -  J  Px(XV-’Xn) 


-00  -00 


In 


Px(x \,  —  ,xn) 


dx  j  •  •  •  dxn . 


n  PxMk) 

k=i 

The  inequality  In  x  >  1  -  (l/x)  can  be  applied  to  Equation  (129),  resulting  in 

n 

Y\pxMk) 


00  00 


^  j  -J  Px(XV-’Xn) 


1  — 


*=1 


Px(xt,...,xn) 


dx  j  •  •  •  dxn . 


Distribution  of  the  integral  yields 


oo  oo 


oo  oo 


-  j  '"j  Px(Xl’---’Xn)dxi---dXn-  J  J 


/?x(x,,...,x„)I7^(  (x/c) 


k=\ 


—oo  —oo 


-oo  -oo 


(127) 


(128) 


(129) 


(130) 


-dx]  •  •  •  dxn .  (131) 


Px(xi,...,x„) 

The  first  part  of  Equation  (131)  is  the  area  under  the  joint  pdf,  which  equals  one.  The 
second  part  of  Equation  (131)  is  the  product  of  the  area  under  each  independent  pdf, 
which  also  equals  one.  Therefore, 


£,H(XiyH(X„...,X,)>l-l=0 


(132) 


k= 1 


Rearranging  Equation  (132),  we  obtain 
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(133) 


h(x„...,x,)<Y,h(x,), 

k= 1 

thus,  proving  that  dependent  random  events  have  less  entropy  than  independent  random 
events. 
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APPENDIX  C.  THE  TRIANGULAR  MODEL  APPROXIMATION 

(DERIVATION) 


A.  PDF 

The  approximate  polar  NRZ  pdf  (shown  in  Figure  19  and  reproduced  below)  was 
found  by  breaking  the  pdf  into  the  regions  shown  in  the  figure,  and  the  slope  and  v  - 
intercept  of  each  region  was  found  for  non-zero  regions  according  to  the  slope-intercept 
form  z  =  my  +  b . 


Approximate  Polar  NRZ  Probability  Density  Function  (pdf) 


Figure  26  Approximate  polar  NRZ  pdf  (from  Figure  19). 


First,  the  height  of  each  triangle  is  found  according  to  the  equation  for  the  area  of  a 
triangle 

A  =  ^bh  (134) 

where  area  A  is  one-half  the  base  b  times  the  height  h  .  As  the  total  area  under  a  pdf  is 
equal  to  one,  the  area  of  each  triangle  is  0.5 .  The  base  b  =  2y ,  so 


h  = 


2  A 


2(0.5)  _  1 
2y 


(135) 


b  2  y 

The  slope  of  regions  2  and  5  is  the  same,  and  the  slope  of  regions  3  and  6  is  the  negative 
of  regions  2  and  5.  Slope  in  =  Az/Ay  and  was  found  to  be 


Az  /2  y  0 

m  =  —  =  - — - - 

Ay  l-(l-y) 


1 


2  f 


(136) 
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The  slope  is  positive  in  regions  2  and  5  and  negative  in  regions  3  and  6. 


The  y  -intercept  b  is  found  according  to  the  slope-intercept  equation  z  =  my  +  b . 
For  regions  2  and  4, 


For  regions  3  and  5, 


1  (1  +  7) 

b  =  z-my  =  0-—(\  +  r)  =  —— 

2  y  2  7“ 


1  (7-1) 

b  =  z-my  =  0-— -(y-l)  =  — — 
2  y  2  y 


The  resultant  expression  for  the  approximate  polar  NRZ  pdf  q '( v)  is 


q\y)  = 


0 


y  <-(!  +  /) 


1 

2y2 

y+ 

1  +  y 
2y2 

-(1  +  7)  <  y<- 

-1 

2y2 

y+ 

r- 1 

2y2 

-1<  v<7-l 

0 

l 

V 

VI 

7 

1 

2y2 

y+ 

Y- 1 
2y2 

V 

VI 

1 

-1 

1  +  7 

1  <  v  <  1  +  7 

2y2 

y+ 

2r 

0 

v  >  1  +  7 

(137) 


(138) 


(139) 


B.  CDF 

The  approximate  cdf  q(y)  was  found  by  integrating  the  pdf  q  \y) . 


For  region  2, 


q(y)  =  \q'(y)=  j  (my'+b)dy'  =  ]-my'2  +  by' 

-(i +r)  1 

q(y)  =  ^  m  (y2  -  (i + r)2 ) + b  {y  -  (i  -  r))  ■ 


-(i +r) 


(140) 

(141) 
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For  region  3, 


For  region  5, 


q(y)  =  \q  \y)  =  0.25  +  J  (my '+  b)  dy  ’  =  0.25  +  -  my  ’2  +  by 

-i  \_^ 

q(y)  =  0.25+  ^m(l-y2)  +  b(l  +  y)  . 
q(y)  =  \q\y)  =  0-5+  }  (; my'+b)dy' =  0.5  +  ±-my'2+by ' 

i -r  2  i 

q(y)  =  o.5  +  ( v2  -  (l  -yf)+b  ( v  -  (l  -  r)) 


(142) 


(143) 


(144) 


(145) 


For  region  6, 


q(y)  =  J  q  ’( y)  =  0.75  +  J  (my '+  b)  dy  ’  =  0.75  +  -  my  '2  +  by 

1  L2 

q(y)  =  0.75  +  ^m  (l  -y2)  +  b(y- 1) 


(146) 


(147) 


The  resultant  expression  for  the  approximate  polar  NRZ  cdf  q(y )  is 


j  m  [y2  -  (1  +  rf )  +  b  (y  -  (1  -  y)) 
0.25+  ]-m(\-y2)  +  b(\  +  y) 


q(y)  =  \  0-5 


y<-(l  +  y) 

~{\  +  y)<y<-\ 

-\<y<y-\ 
y-\<y<l-y  (148) 


q(y)  =  0-5+  ^m(y2 -(l-y)2)  +  b(y-(l-y))  \-y<y<\ 

q(y)  =  0.75  +  ^7«(l-j2)  +  6(v-l)  1<  y  <1  +  y 

y  >  i  +  r 
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APPENDIX  D.  INFOMAX  CODE  FOR  HIGH-KURTOSIS  SIGNALS 


INFOMAX 


LT  Jennie  H.  Garvey 
01  May  2007 


%  Adapted  from: 

%  James  V.  Stone's 
%  ICA  Tutorial 
%  Appendix  D 


DESCRIPTION: 

3  high-kurtosis  signals 

multiple  iterations  of  a  modified  gradient  ascent  function 
cdf/pdf:  hyperbolic  tangent 


clear 


INPUT  Section 


%  All 
%  for 


inputs  are  entered  here 
ease  of  adapting  code 


%  Enter  number  of 
M  =  3; 

%  Enter  number  of 
N  =  10000; 

%  Enter  maximum  # 
maxiter  =  100; 


signals  (make  modifications  for  add'l 


samples 


of  iterations  for  Gradient  Ascent 


sigs  below) 
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%  Enter  inital  step  size  for  Gradient  Ascent 
eta  =  . 1 ; 

%  Enter  increased  step  size  for  Gradient  Ascent 
alpha  =  1.2; 

%  Enter  decreased  step  size  for  Gradient  Ascent 
beta  =  0.1; 

%  Enter  #  of  times  to  repeat  Gradient  Ascent 
gradasiter  =  5; 


MUST  DEFINE  cdf,  pdf,  dpdf  in  GRADIENT  ASCENT  function 


%  %  Set  Random  Number  State 

%  state=2;  rand (' state state) ;  randn (' state state) 

%  Create  "random"  signals. . . 

M  =  M;  %  #  of  signals 

N  =  N;  %  #  of  samples 

i  =  1 :  N; 

load  chirp;  sl=y(i);  sl=sl/std (si ) ; 
load  gong;  s2=y(i);  s2=s2 /std ( s2 ) ; 
load  splat;  s3=y(i);  s3=s3/std (s3) ; 

%  Create  "unknown"  signal  mixture  x 
w  =  rand (M) ; 

s  =  [sl,s2,s3];  %column  vectors 
x  =  s  *  w  ; 
mixture  =  x; 

%  Perform  GRADIENT  ASCENT  repetitions 
for  j  =  1 : gradasiter ; 

if  j  ==  1 

W  =  eye (M) ;  %initializes  1st  W  to  identity  matrix 

else 

W  =  2  *  j  *  rand (M) ;  %randomly  selects  subsequent  W,  increases  w/  j 

end 

[y , hs , grads , etas , w]  =  GradientAscent_highkurtosis (N, maxi ter , alpha, beta, eta, x, w) 

%Create  temporary  arrays  to  store  values  from  each  iteration 
ytemp ( j , : , : )  =  y; 
hstemp ( j , : , : )  =  hs; 
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gradstemp ( j , : , : )  =  grads; 
etastemp ( j , : , : )  =  etas; 

Wtemp  ( j  ,  :  ,  : )  =  fit- 

end 

%  Select  iteration  with  highest  ENTROPY 

[maxh, index]  =  max (max (hstemp ' ) ) ;  %finds  highest  entropy  index 

%  Select  y,  hs,  grads,  etas,  W  from  highest  entropy  index 

y  ( : , : )  =  ytemp (index, :,:) ; 

hs  ( :  ,  :  )  =  hstemp  (index, 

grads  ( :  ,  : )  =  gradstemp  (index,  : ,  : )  ; 

etas  =  etastemp  (index, 

W  ( : ,  : )  =  Wtemp  (index, 

%  Break  out  extracted  signals  (column  vectors) 
yl  =  y  ( : ,  1 )  ;  y2  =  y(:,2);  y3  =  y(:,3); 


%%  PLOTS 


%Plot  original  signals 
figure ( 1 ) ; 

subplot (3, M, 1) ;  plot(i,sl);  title (' \bfSignal  1');  xlabel (' discrete  time'); 
ylabel ( 'Voltage' ) 

subplot (3, M, 2) ;  plot(i,s2);  title (' \bfSignal  2');  xlabel (' discrete  time'); 
ylabel ( 'Voltage' ) 

subplot (3, M, 3) ;  plot(i,s3);  title (' \bfSignal  3');  xlabel (' discrete  time'); 
ylabel ( 'Voltage' ) 


%Plot  signal  mixtures 

subplot (3, M, 4) ;  plot ( i , mixture (:, 1 )) ;  title (' \bfSignal  Mixture  1');  xlabel (' discrete 
time ' ) ;  ylabel ( 'Voltage ' ) 

subplot (3, M, 5) ;  plot ( i , mixture (:, 2 )) ;  title (' \bfSignal  Mixture  2');  xlabel (' discrete 
time ' ) ;  ylabel ( 'Voltage ' ) 

subplot (3, M, 6) ;  plot ( i , mixture (:, 3 )) ;  title (' \bfSignal  Mixture  3');  xlabel (' discrete 
time');  ylabel (' Voltage ' ) 


%Plot  extracted  signals 

subplot (3, M, 7) ;  plot(i,yl);  title (' \bfExtracted  Signal  1');  xlabel (' discrete  time'); 
ylabel ( 'Voltage' ) 
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subplot (3, M, 8) ;  plot(i,y2);  title (' \bfExtracted  Signal  2');  xlabel (' discrete  time') 
ylabel ( 'Voltage' ) 

subplot (3, M, 9) ;  plot(i,y3);  title (' \bfExtracted  Signal  3');  xlabel (' discrete  time') 
ylabel ( 'Voltage' ) 

%  %  Plot  PDFs  of  "Random"  Signals  (Optional) 

%  figure (2);  subplot (2, 1, 1) ;  hist ( si , N/ 100 ) ;  title (' \bf PDF  of  Signal  1') 

%  figure (2);  subplot ( 2 , 1 , 2 ) ;  hist ( s2 , N/ 100 ) ;  title (' \bf PDF  of  Signal  2') 


%Plot  change  in  h  and  gradient  magnitude  during  optimization 
figure  (3)  ; 

subplot (3,1,1) ;plot  (hs) ; 

title (' \bf Function  Values  -  Entropy '); xlabel (' Iteration ') ;  ylabel (' h (Y) ') ; 
subplot (3,1,2) ;plot (grads) ; 

title (' \bfMagnitude  of  Entropy  Gradient '); xlabel  (' Iteration ') ;  ylabel (' Gradient 
Magnitude ' ) ; 

subplot (3, 1, 3) ;plot (etas)  ; 

title (' \bfMagnitude  of  Eta '); xlabel (' Iteration ') ;  ylabel ('Eta  Magnitude'); 


%  %  Modified  plots  of  signals 
%  figure (4) ; 

%  subplot (1,M,  1) ;  plot (i, si); 
%  subplot (1,M, 2)  ;  plot(i,s2); 
%  subplot (1,M, 3) ;  plot(i,s3); 

%  figure  (5)  ; 

%  plot(i,x);  title (' \bfSignal 

%  figure  (6) ; 

%  subplot  (1,M,  1)  ;  plot(i,yl); 
%  subplot (1,M, 2) ;  plot(i,y2); 
%  subplot (1,M, 3) ;  plot(i,y3); 


and  extracted  signals 

title ( ' \bfSignal  1 ' ) ; 
title ( ' \bfSignal  2'); 
title (' \bf Signal  3'); 

Mixtures ' ) 

title ( ' \bfExtracted  Signal  1 ' ) ; 
title ( ' \bfExtracted  Signal  2 ' ) ; 
title (' \bfExtracted  Signal  3'); 


%soundsc (y2, Fs) ; 


%  Listen  to  audio  signals  . . . 

%  [10000]  Fs  Sample  rate  of  speech. 
listen=l ; 

Fs=10000; 

if  listen  soundsc (yl , Fs ) ; 

end 

w;  %  Mixing  Matrix 

W;  %  Unmixing  Matrix 

I=w*W  %  should  yield  Identity  matrix 

index 
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APPENDIX  E.  GRADIENT  ASCENT  FUNCTION  (HIGH- 

KURTOSIS) 


%  GRADIENT  ASCENT  for  HIGH  KURTOSIS  signals 

%  This  function  performs  a  Gradient  Ascent  algorithm  that  varies  its  "step" 

%  size  as  a  maximum  is  approached  -  input  cdf,  pdf,  and  dpdf  required! 

%  eta  =  initial  step  size,  alpha  =  increased  step  size,  beta  =  decreased  step  size 
%  CDF,  PDF,  DPDF  INPUTS:  hyperbolic  tangent 

% [y , hs , grads , etas , W]  =  GradientAscent_highkurtosis (N, maxi ter , alpha, beta, eta, x, W) 


function [y,  hs,  grads,  etas,  Wj  =  GradientAscent_highkurtosis (N,  maxiter,  alpha,  beta 
eta,  x,  W) 


%  Create  arrays  to  store  values  of  h,  grad,  eta,  and  W 

hs  =  zeros (maxiter , 1 ) ;  gs  =  zeros (maxiter , 1 ) ;  etas  =  zeros (maxiter, 1) ; 

for  iter  =  l:maxiter 
y  =  x  *  W  ; 


%  INPUT  for  Gradient  Ascent  % 


%  Input  model  CDF  [ Y=g ( y ) ] 

Y  =  tanh (y) ; 

%  Input  model  PDF  [cdf'=pdf=g' (y) ] 
pdf  =  (1  -  tanh(y).A2); 

%  Input  PDF  derivative  [dpdf=g' '  (y) ] 
dpdf  =  -2  *  tanh(y)  +  2  *  tanh(y).A3; 


%  End  INPUT  -  more  in  'else'  below!  % 


psi  =  (dpdf)./  (eps  +  pdf);  %or:  psi  =  -2*Y  (for  high-kurtosis  sigs  only); 
detw  =  abs(det(W)); 

%  Calculate  entropy  for  current  iteration 
h  =  (1  /  N)  *  sum (sum (log (eps  +  pdf)))  +  log(detw); 

if  iter  >  1 

if  h  >  hsfiter  -  1)  %(if  entropy  increased) 
eta  =  alpha  *  etas (iter  -  1); 

else  %  h  <  hs  (iter  -  1):  (h  got  smaller  -  entropy  decreased) 
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W  =  Wold; 

eta  =  beta  *  etas (iter  -  1); 
y  =  x  *  W  ; 


%  INPUT  for  ELSE  loop  % 


%  Use  same  cdf,  pdf  as  above!  % 


%  Input  model  CDF 
Y  =  tanh (y) ; 

%  Input  model  PDF 
pdf  =  ( 1-tanh (y) . A2 ) ; 

%  Input  PDF  derivative 

dpdf  =  -2*tanh (y) +2*tanh (y) . A3; 


%  End  INPUT 


%Recalculate  detw,  h,  psi  for  Wold 
detw  =  abs(det(W)); 

h  =  ( 1/N) *sum (sum (log (eps+pdf ) ) )  +  log(detW); 
psi  =  (dpdf )./ (eps+pdf ) ; 

end 

else  %  iter  <=  1 

h  =  h; 

end 

grad  =  inv(W')  +  (1  /  N)  *  (x1  *  psi) ; 

W  =  W  +  eta  *  grad; 

%Record  h,  grad,  eta,  W 

hs(iter)  =  h;  grads (iter)  =  norm (grad (:)) ;  etas (iter)  =  eta; 

end 

%OUTPUTS  y,  hs,  grads,  etas 

y  =  y;  hs  =  hs;  grads  =  grads;  etas  =  etas;  W  =  W; 


Wold  =  W 


82 


APPENDIX  F.  POLAR  NRZ  SIGNAL  FUNCTION 


%  POLAR  NRZ  SIGNAL 

%  This  function  returns  a  vector  representing  a  polar  NRZ  signal. 
%  s  =  polarNRZ (totaltime,  bitrate,  samplerate) 


function  s  =  polarNRZ (totaltime,  bitrate,  samplerate) 

%  STEP  1  -  Generate  random  bit  sequence 

bits  =  rand (totaltime  *  ceil (bitrate) ,  1)  <  0.5; 

bits  =  2  *  (bits  -  .5);  %converts  bits  from  0,1  to  -1,1 

%  STEP  2  -  Duplicate  bit  sequence  n  times,  where  n  =  samplerate  /  bitrate 
n  =  ceil (samplerate/bitrate) ; 
x  =  [  ]; 
for  i  =  l:n 

x  =  [x,bits];  %  duplicates  column  vector 

end 

%  STEP  3  -  Concatenate  rows  of  x 
y  =  t  1 ; 

for  i  =  1 ; length (bits) 
y  =  [  y  x  ( i ,  : )  ]  ; 

end 

actualsamples  =  totaltime  *  samplerate; 
s  =  y (1 : actualsamples) ; 

%  STEP  4  -  change  first  bit  to  be  random  (i.e.  not  always  at  the  beginning  of  a  bit) 

shift  =  ceil (rand*samplerate/bitrate)  ; 
s  =  [s (shift  +  1 : actualsamples)  s  ( 1 : shift)]'; 
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APPENDIX  G.  POLAR  NRZ  CDF  FUNCTION:  TRIANGULAR 

MODEL 


%  POLAR  NRZ  CDF 

%  This  function  returns  an  approximate  cdf  of  a  polar  NRZ  signal 
%  using  the  triangular  model  with  gamma  as  small  base  constant 
%  z  =  polarNRZcdf (y,  gamma) 


function  z  =  polarNRZcdf (y, 
m  =  1  /  (2  *  gamma A2) ; 
bpos  =  1  +  gamma; 
bneg  =  gamma  -  1; 
bdenom  =  2  *  gammaA2; 
bl  =  bpos  /  bdenom; 
b2  =  bneg  /  bdenom; 
z  =  zeros (size (y) ) ; 


gamma) 

%  slope  (pos  or  neg) 

%  numerator  of  y-intercept  (pos) 

%  numerator  y-intercept  (neg) 

%  denominator  of  y-intercept  b 
%  y-intercept  (pos) 

%  y-intercept  (neg) 

%  creates  array  to  store  values  for  cdf 


%  Define  "critical  points"  -  where  equations  change 
critPoints  =  [-(1+gamma)  -1  (-1+gamma)  (1-gamma)  1  (1+gamma) ] ; 


%  Region  1  equation  (z  <  -1-gamma) 

%  z  =  0  -  do  nothing! 

%  Region  2  equation  (-1-gamma  <  z  <=  -1) 

z2  =  (0.5  *  m  *  (y.A2  -  (1  +  gamma) A2)  +bl*  (y  +  1  +  gamma)).*  (y  >= 
&  y  <  critPoints (2 )) ; 

%  Region  3  equation  (-1  <  z  <=  -1+gamma) 

z3  =  (.25  +  (.5  *  m  *  (1  -  y.A2)  +  b2  *  (1  +  y) ) ) . *  (y  >=  critPoints (2 ) 
critPoints (3) ) ; 

%  Region  4  equation  (-1+gamma  <  z  <=  1-gamma) 

z4  =  .5.*  (y  >=  critPoints (3)  &  y  <  critPoints ( 4 )) ; 

%  Region  5  equation  (1-gamma  <  z  <=  1) 

z5  =  (.5  +  (.5  *  m  *  (y.A2  -  (1  -  gamma) A2)  +  b2  *  (y  -  (1-gamma)))).* 
critPoints (4)  &  y  <  critPoints (5) )  ; 

%  Region  6  equation  (1  <  z  <=  1+gamma) 

z6  =  (.75  +  (.5  *  m  *  (1  -  y.A2)  +  bl  *  (y  -  1))).*  (y  >=  critPoints  (5) 
critPoints (6) ) ; 

%  Region  7  equation  (z  >  1+  gamma) 
z7  =  (y  >=  critPoints ( 6) ) ;  %  always  1 
%  Combine  values  of  each  region  for  output  cdf 
z  =  z2  +  z3  +  z4  +  z5  +  z6  +  z7; 


critPoints  ( 1 ) 


&  y  < 


(y  >= 


&  y  < 
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APPENDIX  H.  POLAR  NRZ  PDF  FUNCTION:  TRIANGULAR 

MODEL 


%  POLAR  NRZ  PDF 

%  This  function  returns  an  approximate  pdf  of  a  polar  NRZ  signal 
%  using  the  triangular  model  with  gamma  as  small  base  constant 
%  z  =  polarNRZpdf (y,  gamma) 


function  z  =  polarNRZpdf (y,  gamma) 


m  =  1  /  (2  *  gamma A2) ; 
bpos  =  1  +  gamma; 
bneg  =  gamma  -  1; 
bdenom  =  2  *  gammaA2; 
bl  =  bpos  /  bdenom; 
b2  =  bneg  /  bdenom; 


slope  (pos  or  neg) 
numerator  of  y-intercept  (pos) 
numerator  y-intercept  (neg) 
denominator  of  y-intercept  b 
y-intercept  (pos) 
y-intercept  (neg) 


z  =  zeros (size (y)  )  ; 


%  creates  array  to  store  values 


%  Define  "critical  points"  -  where  equations  change 
critPoints  =  [-(1+gamma)  -1  (-1+gamma)  (1-gamma)  1  (1+gamma) ] ; 

%  Region  1  (z  <  -1-gamma)  -  always  0 
%  Region  2  equation  (-1-gamma  <  z  <=  -1) 

z2  =  (m  *  y  +  bl).*(  y  >=  critPoints ( 1 )  &  y  <  critPoints (2 )) ; 

%  Region  3  equation  (-1  <  z  <=  -1+gamma) 

z3  =  (-m  *  y  +  b2).*  (y  >=  critPoints (2 )  &  y  <  critPoints (3) ) ; 
%  Region  4  (-1+gamma  <  z  <=  1-gamma)  -  always  0 
%  Region  5  equation  (1-gamma  <  z  <=  1) 

z5  =  (m  *  y  +  b2).*  (y  >=  critPoints ( 4 )  &  y  <  critPoints (5) ) ; 

%  Region  6  equation  (1  <  z  <=  1+gamma) 

z6  =  (-m  *  y  +  bl).*  (y  >=  critPoints (5)  &  y  <  critPoints ( 6) ) ; 
%  Region  7  (z  >  1+  gamma)  -  always  0 

%  Combine  values  of  each  region  for  output  pdf 
z  =  z2  +  z3  +  z5  +  z6; 


87 


THIS  PAGE  INTENTIONALLY  LEFT  BLANK 


88 


APPENDIX  I.  POLAR  NRZ  DPDF  FUNCTION:  TRIANGULAR 

MODEL 


function  z  =  polarNRZdpdf (y,  gamma) 

m  =  1  /  (2  *  gammaA2) ;  %  slope  (pos  or  neg) 

z  =  zeros (size (y) ) ;  %  create  array  to  store  values 

%  Define  "critical  points"  -  where  equations  change 
critPoints  =  [-(1+gamma)  -1  (-1+gamma)  (1-gamma)  1  (1+gamma) ] ; 

%  Region  1  (z  <  -1-gamma)  -  always  0 

%  Region  2  equation  (-1-gamma  <  z  <=  -1) 

z2  =  (m)  . *  (y  >=  critPoints ( 1 )  &  y  <  critPoints (2 ))  ; 

%  Region  3  equation  (-1  <  z  <=  -1+gamma) 

z3  =  (-m).*  (y  >=  critPoints (2 )  &  y  <  critPoints (3) ) ; 

%  Region  4  (-1+gamma  <  z  <=  1-gamma)  -  always  0 

%  Region  5  equation  (1-gamma  <  z  <=  1) 

z5  =  (m) . *  (y  >=  critPoints (4)  &  y  <  critPoints (5) ) ; 

%  Region  6  equation  (1  <  z  <=  1+gamma) 

z6  =  (-m).*  (y  >=  critPoints (5)  &  y  <  critPoints ( 6) ) ; 

%  Region  7  (z  >  1+  gamma)  -  always  0 

%  Combine  values  of  each  region  for  output  dpdf 
z  =  z2  +  z3  +  z5  +  z6; 
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APPENDIX  J.  POLAR  NRZ  CDF  FUNCTION:  TANH  MODEL 


%  POLAR  NRZ  CDF2 

%  This  function  returns  an  approximate  cdf  of  a  polar  NRZ  signal 
%  using  the  modified  hyperbolic  tangent  model  w/  compression  factor  sigma 
%  z  =  polarNRZcdf2 (y,  sigma) 


function  z  =  polarNRZcdf2 (y,  sigma) 

z  =  zeros (size (y) ) ;  %  creates  array  to  store  values 

%  Define  "critical  point"  -  where  equation  changes 
critPoint  =  0; 

%  Equation  for  z  <  0 

zl  =  (.25  *  (tanh (sigma  *  (y  +  1) )  +1)).*  (y  <  critPoint); 

%  Equation  for  z  >=  0 

z2  =  (.25  *  (tanh (sigma  *  (y  -  1) )  +3)).*  (y  >=  critPoint); 

%  Combine  values  of  each  region  for  output  cdf 
z  =  zl  +  z2; 
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APPENDIX  K.  POLAR  NRZ  PDF  FUNCTION:  TANH  MODEL 


function  z  =  polarNRZpdf2 (y,  sigma) 

z  =  zeros (size (y) ) ;  %  creates  array  to  store  values 

%  Define  "critical  point"  -  where  equation  changes 
critPoint  =  0; 

%  Equation  for  z  <  0 

zl  =  ((sigma  /  4)  *  (1  -  tanh (sigma  *  (y  +  1)).A2)).*  (y  <  critPoint); 

%  Equation  for  z  >=  0 

z2  =  ((sigma  /  4)  *  (1  -  tanh (sigma  *  (y  -  1)).A2)).*  (y  >=  critPoint); 

%  Combine  values  of  each  region  for  output  pdf 
z  =  zl  +  z2; 
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APPENDIX  L.  POLAR  NRZ  DPDF  FUNCTION:  TANH  MODEL 


%  POLAR  NRZ  DPDF2 

%  This  function  returns  an  approximate  pdf  derivative  of  a  polar  NRZ  signal 
%  using  the  modified  hyperbolic  tangent  model  w /  compression  factor  sigma 
%  z  =  polarNRZdpdf 2 (y,  sigma) 


function  z  =  polarNRZdpdf 2 (y,  sigma) 

z  =  zeros (size (y) ) ;  %  creates  array  to  store  values 

%  Define  "critical  point"  -  where  equation  changes 
critPoint  =  0; 

%  Equation  for  z  <  0 


zl  =  -((sigmaA2)  /  2).* 
critPoint) ; 

(tanh (sigma  * 

(y  +  1)  ) 

. *  polarNRZpdf (y,  sigma)) 

.  *  (y  < 

%  Equation  for  z  >=  0 

z2  =  -((sigmaA2)  /  2).* 

(tanh (sigma  * 

(y  -  1)  ) 

. *  polarNRZpdf (y,  sigma)) 

.*  (y  > 

critPoint) ; 


%  Combine  values  of  each  region  for  output  dpdf 
z  =  zl  +  z2; 
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APPENDIX  M.  INFOMAX  CODE  FOR  POLAR  NRZ  SIGNALS 


INFOMAX 


LT  Jennie  H.  Garvey 
01  May  2007 


%  Adapted  from: 

%  James  V.  Stone's 
%  ICA  Tutorial 
%  Appendix  D 


DESCRIPTION: 

3  polar  NRZ  signals 

multiple  iterations  of  a  modified  gradient  ascent  function 
cdf/pdf:  modified  hyperbolic  tangent  function 


clear 

format  compact 

%  %  Set  Random  Number  State 

%  state=2;  rand (' state state) ;  randn (' state state) 


INPUT  Section 


%  All  inputs  are  entered  here  % 
%  for  ease  of  adapting  code  % 


%  Enter  number  of  signals  (will  need  to  make  modifications  for  add'l  sigs  below) 
M  =  3; 
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%  Enter  number  of  samples 


N  =  100; 

%  Enter  lenth  of  time  vector  in  seconds 
totaltime  =  5; 

%  Enter  sample  rate 
samplerate  =  100; 

%  Enter  maximum  #  of  iterations  for  Gradient  Ascent 
maxiter  =  100; 

%  Enter  inital  step  size  for  Gradient  Ascent 
eta  =  .05; 

%  Enter  increased  step  size  for  Gradient  Ascent 
alpha  =  1.2; 

%  Enter  decreased  step  size  for  Gradient  Ascent 
beta  =  0.1; 

%  Enter  #  of  times  to  repeat  Gradient  Ascent 
gradasiter  =  100; 

%  Enter  new  W  step  for  multiple  Gradient  Ascent  iterations 
step  =  0.01; 

%  Enter  compression  factor  for  polar  NRZ  functions 
sigma  =  2; 


%  MUST  DEFINE  cdf,  pdf,  dpdf  in  GRADIENT  ASCENT  function 


%  Create  "random"  signals... 

M  =  M;  %  #  of  signals,  this  will  have  to  be  changed  below  if  M  is  increased 

N  =  N;  %  #  of  samples 

i  =  1 :N; 

%sl= (rand (1) ) . *polarNRZ (totaltime, 10*rand (1) , samplerate) ;  %sl=sl/std (si) ; 

%s2= (rand ( 1 ) )  . *polarNRZ (total time, 10* rand ( 1 ) , samplerate)  ;  %s2=s2 /std ( s2 ) ; 
si  =  (rand (1) ). *polarNRZ (totaltime, 10*rand (1) , samplerate) ; 
s2  =  (rand ( 1 )). *polarNRZ (totaltime, 10*rand ( 1 ), samplerate) ; 
s3  =  (rand ( 1 )). *polarNRZ (totaltime, 10*rand ( 1 ), samplerate) ; 
s  =  [si  s2  s3] ;  %column  vectors 

%  Create  "unknown"  signal  mixture  x 
w  =  rand (M) ; 
x  =  s  *  w  ; 
mixture  =  x; 

%  Perform  GRADIENT  ASCENT  repetitions 
for  j  =  1 : gradasiter ; 
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if  j 


1 


W  =  eye (M) ;  %initializes  1st  W  to  identity  matrix 

else 

W  =  step  *  j  *  rand (M) ;  %randomly  selects  subsequent  W,  increases  w/  j 

end 

%  Call  GradientAscent  function  to  perform  gradient  ascent 

[y , hs , grads , etas , W]  =  GradientAscent_polarNRZ2 (N, maxi ter , alpha, beta, eta, x, W, sigma) 

%  Create  temporary  arrays  to  store  values  from  each  iteration 

ytemp ( j , : , : )  =  y; 

hstemp ( j , : , : )  =  hs; 

gradstemp ( j , : , : )  =  grads; 

etastemp ( j , ; , : )  =  etas; 

Wtemp  ( j  ,  : ,  : )  =  fib- 

end 

%  Select  iteration  with  highest  ENTROPY 

[maxh, index]  =  max (max (hstemp ')) ;  %finds  highest  entropy  index 

%  Select  y,  hs,  grads,  etas,  W  from  highest  entropy  index 

y  ( : , : )  =  ytemp (index, :,:) ; 

hs ( : ,  : )  =  hstemp (index,  : ,  : )  ; 

grads  ( :  ,  : )  =  gradstemp  (index,  :  ,  :  )  ; 

etas ( : , : )  =  etastemp (index, 

W(:,:)  =  Wtemp (index, :,:) ; 

%  Break  out  extracted  signals  -  using  column  vectors 
yl  =  y(:,l);  y2  =  y(:,2);  y3  =  y(:,3); 


%%  PLOTS 


%Plot  original  signals 
figure ( 1 ) ; 

subplot (3, M, 1) ;  plot (si);  title (' \bfSignal  1');  xlabel (' discrete  time'); 
ylabel ( 'Voltage' ) 

subplot (3, M, 2) ;  plot(s2);  title (' \bfSignal  2');  xlabel (' discrete  time'); 
ylabel ( 'Voltage' ) 
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subplot (3, M, 3) ;  plot(s3);  title (' \bfSignal  3');  xlabel (' discrete  time'); 
ylabel ( 'Voltage' ) 

%Plot  signal  mixture 

subplot (3, M, 4) ;  plot (mixture (:, 1 )) ;  title (' \bfSignal  Mixture  1');  xlabel (' discrete 
time');  ylabel (' Voltage ' ) 

subplot (3, M, 5) ;  plot (mixture (;, 2 )) ;  title (' \bfSignal  Mixture  2');  xlabel (' discrete 
time');  ylabel (' Voltage ' ) 

subplot (3, M, 6) ;  plot (mixture (;, 3 )) ;  title (' \bfSignal  Mixture  3');  xlabel (' discrete 
time ' ) ;  ylabel ( 'Voltage ' ) 

%Plot  extracted  signals 

subplot (3, M, 7) ;  plot(yl);  title (' \bfExtracted  Signal  1');  xlabel (' discrete  time'); 
ylabel ( 'Voltage ' ) 

subplot (3, M, 8) ;  plot(y2);  title (' \bfExtracted  Signal  2');  xlabel (' discrete  time'); 
ylabel ( 'Voltage' ) 

subplot (3, M, 9) ;  plot(y3);  title (' \bfExtracted  Signal  2');  xlabel (' discrete  time'); 
ylabel ( 'Voltage' ) 

%Plot  change  in  h  and  gradient  magnitude  during  optimization 
figure (3); 

subplot  (3, 1, 1) ;  plot(hs);  title (' \bf Function  Values  -  Entropy'); 

xlabel ( ' Iteration ' ) ;  ylabel ( ' h ( Y) '); 
subplot (3, 1/2) ;  plot(grads);  title (' \bfMagnitude  of  Entropy  Gradient'); 

xlabel (' Iteration ') ;  ylabel (' Gradient  Magnitude'); 
subplot (3, 1, 3) ;  plot(etas);  title (' \bfMagnitude  of  Eta'); 

xlabel (' Iteration ') ;  ylabel ('Eta  Magnitude'); 

%  %  Plot  PDFs  of  "Random"  Signals  (Optional) 

%  figure (2 ); subplot (2 , 1 , 1 ) ;  hist (si , N/ 100 ) ;  title (' \bf PDF  of  Signal  1') 

%  figure (2 ); subplot (2 , 1 , 2 ) ;  hist ( s2 , N/ 100 ) ;  title (' \bf PDF  of  Signal  2') 


%  CHECK  % 


w;  %  Mixing  Matrix 
W  %  Unmixing  Matrix 

I  =  w*W  %  should  yield  Identity  matrix 
%  Compute  optimum  h 
pdfopt  =  polarNRZpdf (s, sigma) ; 
detWopt  =  abs (det (inv (w) ) ) ; 

hopt  =  ( 1/N) *sum (sum (log (eps+pdfopt) ) )  +  log(detWopt) 

h  =  max(hs) 

index 
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APPENDIX  N.  GRADIENT  ASCENT  FUNCTION  (POLAR  NRZ 

TANH  MODEL) 


%  GRADIENT  ASCENT  for  POLAR  NRZ  signals 

%  This  function  performs  a  Gradient  Ascent  algorithm  that  varies  its  "step" 

%  size  as  a  maximum  is  approached  -  input  cdf,  pdf,  and  dpdf  required! 

%  eta  =  initial  step  size,  alpha  =  increased  step  size,  beta  =  decreased  step  size 
%  CDF,  PDF,  DPDF  INPUTS:  modified  hyperbolic  tangent  function  for  approximation 
%  [y, hs, grads, etas, W]  =  GradientAscent_polarNRZ2 (N, maxiter, alpha, beta, eta, x, W, sigma) 


function [y,  hs,  grads,  etas,  W]  =  GradientAscent_highkurtosis (N,  maxiter,  alpha,  beta, 
eta,  x,  W,  sigma) 

%  Create  arrays  to  store  values  of  h,  grad,  eta,  and  W 

hs  =  zeros (maxiter , 1 ) ;  gs  =  zeros (maxiter , 1 ) ;  etas  =  zeros (maxiter , 1 ) ; 

for  iter  =  limaxiter 
y  =  x  *  W ; 


%  INPUT  for  Gradient  Ascent  % 


%  Input  model  CDF  [Y=g(y)] 

Y  =  polarNRZcdf 2 (y, sigma) ; 

%  Input  model  PDF  [cdf ' =pdf=g ' (y) ] 
pdf  =  polarNRZpdf 2 (y, sigma) ; 

%  Input  PDF  derivative  [dpdf=g ' '  (y)  ] 
dpdf  =  polarNRZdpdf 2 (y, sigma) ; 


%  End  INPUT  -  more  in  'else'  below!  % 


psi  =  (dpdf)./  (eps  +  pdf); 
detw  =  abs (det (W) ) ; 

%  Calculate  entropy  for  current  iteration 
h  =  (1  /  N)  *  sum (sum (log (eps  +  pdf)))  +  log(detW); 


if  iter  >  1 

if  h  >  hs(iter  -  1)  %  (if  entropy 

eta  =  alpha  *  etas (iter  -  1) ; 
else  %h<hs (iter-1 ) :  (h  got  smaller 


increased) 


-  entropy  decreased) 
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W  =  Wold; 

eta  =  beta  *  etas (iter  -  1); 
y  =  x  *  W  ; 


%  INPUT  for  ELSE  loop  % 


%  Use  same  cdf,  pdf  as  above!  % 


%  Input  model  CDF 
Y  =  polarNRZcdf 2 (y, sigma) ; 

%  Input  model  PDF 

pdf  =  polarNRZpdf 2 (y, sigma) ; 

%  Input  PDF  derivative 

dpdf  =  polarNRZdpdf 2 (y, sigma) ; 


%  End  INPUT 


%Recalculate  detw,  h,  psi  for  Wold 
detw  =  abs(det(W)); 

h  =  (1  /  N)  *  sum (sum (log (eps  +  pdf)))  +  log(detw); 
psi  =  (dpdf)./  (eps  +  pdf ) ; 

end 

else  %  iter  <=  1 

h  =  h; 

end 

grad  =  inv(W')  +  (1  /  N)  *  (x1  *  psi)  ; 

W  =  W  +  eta  *  grad; 

%Record  h,  grad,  eta,  W 

hs(iter)  =  h;  grads (iter)  =  norm (grad (:)) ;  etas (iter)  =  eta; 

end 

%OUTPUTS  y,  hs,  grads,  etas 

y  =  y;  hs  =  hs;  grads  =  grads;  etas  =  etas;  W  =  W; 


Wold  =  W 
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